High speed digital FIR filter design by Lu, Shih-Lien et al.
AN ABSTRACT OF THE THESIS OF
Bo Zhou for the degree of Master of Science in Electrical & Computer Engineering
presented on December 2, 1996.
Title: High Speed Digital FIR Filter Design
Abstract approved:
Shih_Lien Lu
The objective of this thesis is to design a high speed digital FIR filter. The inputs of the
system come from a Delta-Sigma modulator. This FIR filter takes 1024 inputs,
multiplies them with their coefficients and adds the results. The main design task is to
take the input data, which are unweighted single-bit binary numbers at 156MHz,
multiply each bit with the corresponding coefficient and add them to get a weighted
multi-bit output at 20MHz.
Redacted for Privacy©Copyright by Bo Zhou
 
December 2, 1996
 
All Rights Reserved
 High Speed Digital FIR Filter Design
 
by
 
Bo Zhou
 
A THESIS
 
submitted to
 
Oregon State University
 
in partial fulfillment of
 
the requirements for the
 
degree of
 
Master of Science
 
Completed December 2, 1996
 
Commencement March 1997
 Master of Science thesis of Bo Zhou presented on December 2, 1996
APPROVED:
Major Professor, representing Electrical & Computer Engineering
Chair of Department o Electrical & Computer Engineering
Dean of Graduat chool
I understand that my thesis will become part of the permanent collection of Oregon
State University libraries. My signature below authorizes release of my thesis to any
reader upon request.
Bo Zhou, Author
Redacted for Privacy
Redacted for Privacy
Redacted for Privacy
Redacted for PrivacyACKNOWLEDGEMENTS
 
With utmost respect and gratitude, I want to thank Dr. Shih_Lien Lu, my 
supervisor. His broad knowledge and sharp thinking gave the initial idea for this project, 
and his kindness and patience encouraged me through the project. I greatly appreciate his 
the support on this project and on my whole master study. 
Many thanks to Dr. Jack Kenney, Dr. David J. Allstot and Dr. James Welty 
for taking their precious time out of their busy schedules to serve on my graduate 
committee and to read the manuscript of this thesis. 
Thanks to Dr. Richard Schreier, Mr. Bo Zhang, Mr. Haiqing Lin, Mr. Bo Wang 
and Mr. Wenjun Su for their kind help with the thesis. 
Thanks to Ms. Rita Wells for her help with all the tedious paper work that makes 
this thesis possible. 
Thanks to Mr. John Seredich at Silicon Systems Incorporated for taking the effort 
to help me fix the format and present the thesis in such an elegant way. 
Thanks to Westinghouse Electric Co. for their financial support on the project. 
And finally, I want to say thank you to my husband and my little boy for all the 
precious things you give to me. TABLE OF CONTENTS
 
Page 
Chapter 1.  Introduction  1
 
1.1 FIR Filter System  1
 
1.2 Linear Phase FIR  3 
1.3 System Outline  4
 
1.4 Organization of the Document  5
 
Chapter 2.  Overall Structure of the System  6
 
2.1 Input Part  7
 
2.1.1 Shift Register  7
 
2.1.2 Latch Register  7
 
2.2 Pre-Processor Part  8
 
2.2.1 XOR  8
 
2.2.1 AND  8
 
2.3 Data Processor Part  9
 
2.3.1 Wallace Tree  9
 
2.3.2 Carry-Select and Carry-Look-Ahead Adder  9
 
2.4 Summary  10
 
Chapter 3.  Full Adder  11
 
3.1 Full Adder  11 
3.2 The Comparison of Static CMOS Adder and Transmission Gate Adder 13
 
3.3 Static CMOS Full Adder Simulation  14 TABLE OF CONTENTS (Continued) 
Page 
3.3.1 The Affect of Temperature on Circuit Performance  14
 
3.3.2 The Affect of Transistor Size on Circuit Performance  18
 
3.3.3 The Effect of Power Supply Variation on Circuit Performance  19
 
3.4 Summary  19
 
Chapter 4.  Design and Simulation of D Flip Flop  21
 
4.1 Sequential Circuits  21
 
4.2 Flip-Flops  22
 
4.3 D Flip-Flop  23
 
4.4 CMOS Static DFF Simulation  24
 
4.4.1 The Comparison of Three Kinds of Transmission Gate DFF  24
 
4.4.2 Working Performances with Different Sizes of Transistors  27
 
4.4.3 Timing Simulation About DFF  28
 
4.5 Summary  30
 
Chapter 5.  Function Blocks  31
 
5.1 Shift Register and Latch Register  31
 
5.2 Wallace Tree  31
 
5.2.1 Carry-Save Adders  33
 
5.2.2 Wallace Tree  37
 
5.2.3 Sign-Bit in Wallace Tree  40
 
5.2.4 Structure Trade-Off for Wallace Tree  41
 
5.3 Carry-Look-Ahead and Carry-Select Adder  44
 
5.4 Summary  44
 TABLE OF CONTENTS (Continued) 
Page 
Chapter 6.  Conclusion  45
 
6.1 Conclusion  45
 
6.2 Future Work  45
 
Bibliography  47
 
Appendices  48
 
Appendix A Hspice Simulation Results  49
 
Appendix B Wallace Tree Structures  60
 
Appendix C Schematics of the Wallace Trees  72
 LIST OF FIGURES
 
Figure  Page 
1.1  Direct-Form Realization of FIR Filter Structure  3
 
1.2  Direct-Form Realization of Linear-Phase FIR System  4
 
2.1  Overall Structure for the System  6
 
3.1  Complementary CMOS Full Adder  12
 
3.2  Transmission Gate XOR  13
 
3.3  Transmission Gate Full Adder  14
 
4.1  Block Diagram for a Flip-Flop.  22
 
4.2  Static CMOS D FLIP FLOP  24
 
4.3  Test Circuit for Different Kinds of Transmission Gates  25
 
4.4  CMOS Static DFF with Improved Size Transistors  27
 
4.5  DFF Clock & Input Relation  29
 
5.1  Shift Register and Latch Register Block  32
 
5.2  Examples of Single Bit CSA  34
 
5.3  Counters  34
 
5.4  (3,2) and (7,3) Counter Structures  35
 
5.5  (15,4) Counter Structure  36
 
5.6  Example of Multi-Bit Wallace Tree  37
 
5.7  Three 20-bit Wallace Tree Structure  38
 LIST OF FIGURES (Continued) 
Figure Page 
5.8  Seven 20-bit Wallace Tree Structure  38
 
5.9  Fifteen 20-bit Wallace Tree Structure  39
 
5.10  Fifteen 20-bit Wallace Tree with (3,2) to be the Basic Element  42
 
5.10  Continued  43
 
5.11  Carry-Look-Ahead & Carry-Select Adder  44
 LIST OF TABLES
 
Table  Page 
3.1  Truth Table for Full Adder  11
 
3.2  Stimulus to Simulate the Affect of Temperature on FA Delay  17
 
3.3  Delay of FA with Minimized Transistor Sizes  17
 
3.4  Delay of FA with Optimum Transistor Sizes  18
 
3.5  The Delay of FA with Vdd Variation  19
 
4.1  Speeds of DFF With Different Transmission Gate  26
 LIST OF APPENDIX FIGURES
 
Figure  Page 
A.1  Full Adder with Minimum Sized Transistors  50
 
A.2  Full Adder Simulation with Minimum Size Transistors (T= -220)  51
 
A.3  Full Adder Simulation with Minimum Sized Transistors(25C,5.5V)  52
 
A.4  Full Adder with Optimum Sized Transistors  53
 
A.5  DFF with Improved Sized Transistors  54
 
A.6  Static DFF Simulation (1)  55
 
A.7  DFF Simulation (2)  56
 
A.8  DFF Simulation (3)  57
 
A.9  DFF Simulation (4)  58
 
A.10 DFF Simulation (5)  59
 
B.1  MAC 01  61
 
B.2  MAC 02  61
 
B.3  MAC 03  61
 
B.4  MAC 04  62
 
B.5  MAC 11  62
 
B.6  MAC 12  62
 
B.7  MAC 21  63
 
B.8  MAC 22  63
 LIST OF APPENDIX FIGURES (Continued) 
Figure  Page 
B.9  MACRO 31  63
 
B.10 MACRO 41  64
 
B.11 MAC 3  64
 
B.12 MAC 4  64
 
B.13 MAC 5  65
 
B.14 MAC 6  65
 
B.15 MAC 7  65
 
B.16 MAC 8  66
 
B.17  W3N20L20  66
 
B.18  W3N2OB  66
 
B.19  W3N21L20  67
 
B.20 W3N26L20  67
 
B.21  W15N2OB  68
 
B.22 W9N2OB  69
 
B.23  Level III  70
 
B.23  Level III (continued)  71
 
C.1  FIR Top Level Schematic  73
 
C.2  Shift 1024  74
 LIST OF APPENDIX FIGURES (Continued) 
Figure 
C.3  Shift 64 
C.4  Latch 1024 
C.5  Latch 64 
C.6  XOR 512 
C.7  XOR 64 
C.8  M256 
C.9  M32 
C.10 M8 
C.11  FA 24 or Counter (3,2) 
C.12  Counter (7,3) 
C.13  Counter (15,4) 
C.14  Level I 
C.15  W3N2OB 
C.16 W15N2OB 
C.17 W9N2OB 
C.18  Level III 
C.19 MAC 01 
C.20 MAC 02 
Page 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
91 
92 LIST OF APPENDIX FIGURES (Continued) 
Figure  Page 
C.21  MAC 03  93
 
C.22  MAC 04  94
 
C.23  MAC 11  95
 
C.24  MAC 12  96
 
C.25  MAC 21  97
 
C.26  MAC 22  98
 
C.27  MAC 31  99
 
C.28  MAC 41  100
 
C.29  MAC 5  101
 
C.30  MAC 6  102
 
C.31  MAC 7  103
 
C.32  MAC 8  104
 
C.33  W30N20L20  105
 
C.34  W3N26L20  106
 
C.35  28-bit Carry-Look-Ahead & Carry-Select Adder  107
 
C.36  4-bit Carry-Select Adder  108
 
C.37  4-bit Carry-Look-Ahead Adder  109
 High Speed Digital FIR Filter Design 
Chapter 1. Introduction 
In the last two decades, Digital Signal Processing (DSP) has made enormous 
progress both in theory and practice. With the advancement of Very Large Scale 
Integrated Circuit (VLSI) technology, more and more applications are using DSP as the 
primary solution due to the reliability, reproducibility, compactness and efficiency of the 
digital technology. The objective of this thesis is to design  a high speed digital Finite 
Impulse Response (FIR) filter, which is part of a Delta-Sigma Analog to Digital 
Converter. This filter will be part of a digital decimator which is used to convert the high-
speed bit-stream output of the Delta-Sigma modulator into Nyquist-rate PCM data. 
Conceptually, this operation consists of two parts: lowpass filtering and down-sampling. 
For the sake of economy, the over-sampling ratio can be reduced in stages. Since a single-
stage filtering down-sampling block requires more than 10,000 taps to achieve our 
specification requirement, we employed the design of a multiple-stage decimator. This 
work describes the design and implementation of the first-stage decimator. The inputs of 
the system come from a Delta-Sigma modulator. This Finite Impulse Response filter 
takes 1024 inputs, multiplies them with their coefficients and adds the results. The main 
design task is to take the input data, which is unweighted single bit binary numbers at 
156MHz, multiply each bit with the corresponding coefficient and add them to get a 
weighted multibit output at 20MHz. 
1.1. FIR Filter System 
The so-called FIR filter is a widely used filter in DSP. Mathematically, a sampled 
data FIR filter is represented by: 2 
m-1
 
y(n) = I c.1 x(n- 0
 
i = 0 
Where 
y(n) = the output of the filter 
x(i) = the input signal stream 
ci = the ith coefficient of the filter 
m-1 = the order or length of the filter 
In general, we can view the equation as a computational procedure (an algorithm) 
for determining the output sequence y(n) of the system from the input sequence x(n). 
There are a number of well-known forms for the sampled data FIR filter. One 
form is shown in Figure 1.1. It is the so called Direct-Form Structure FIR. 
The system of a FIR filter is mainly composed of registers, adders and multipliers 
when implemented with hardware. The precision of the adder and multiplier depend on 
the precision of the coefficient, the length of the filter, and the desired precision or result. 
The multipliers can be a costly component. If the filter response is fixed and the 
coefficients are therefore fixed, the multipliers may be simplified to contain only the 
product terms required. 
In fixed FIR filters, a great deal of signal-processing expertise goes into designing 
the coefficients that reduce the number of additions efficiently. 3 
OUT 
Figure 1.1  Direct-Form Realization of FIR Filter Structure 
1.2 Linear Phase FIR 
An FIR filter has linear phase if its unit sample response satisfies the condition 
c(n) = ±c(M  n)  n = 0, 1,  M 1 1 
The system we are dealing with is a linear phase FIR filter, so we can use this 
symmetry or antisymmetry characteristic to simplified the system. So the structure 
shown in Figure 1.2 is used for the design. 4 
x(n)  Delay  Delay  Delay  Delay 
2*C0  2*C2  2*C3 
Delay  Delay  Delay  Delay 
410
 
y(n) 
Figure 1.2 Direct-Form Realization of Linear-Phase FIR System 
1.3 System Outline 
The oversampled bit-stream is clocked at 10 GHz. An intermediate multiplexer 
reduces the one-bit-stream at 10 GHz into 64-bit parallel data at 156 MHz. The final 
output of the FIR filter will be clocked at 20MHz. As a result the system will have to 
work at a high speed. 
There are two main obstacles for the design. One is to finish the large amount of 
multiplication and addition within a relatively short period of time, the other is managing 
the large amount of data flow. Several main design methods are taken here to achieve the 
above mentioned goals. They are: 
1. XOR reduces the number of additions at the first step. 5 
2. AND gates implement the multiplications. 
3. Wallace Tree structure reduces the large number of additions. 
4. Combination of carry select and carry lookahead does the final addition. 
5. Pipelined structure manages the data flows. 
There are three basic circuit elements used in the design. They are: D flip-flop, 
XOR, and full adder. Besides these three repeatedly used elements, there are some other 
circuit macros, such as the carry lookahead and carryselect macro, which are also used 
in the implementation of the filter. In this design, since the whole circuit consists 
numerous number of repeating macros, the optimally of each basic cell can contribute 
greatly to the overall performance of the circuit. Using the minimum number of 
transistors for the basic macros can reduce the parasitic capacitance for the circuit, which 
will contribute not only to lower power consumption, but also benefit the delay. Spice 
simulation and first step layout have been done for the basic elements of the circuit in 
order to estimate the overall chip area and circuit performance. Powerview was used to 
set up the schematic structure for the design. Logic simulation has been done for the 
macros of the system. 
1.4 Organization of the Document 
Chapter 2 presents the overall structure of the system. Chapter 3 explains the 
design of the D flip-flop and shows the simulation results of its main characteristics. 
Chapter 4 explains the design of Full Adder and discusses the affect of temperature 
variation, power supply variation and transistor size optimization on its delay. Chapter 
5 presents all the function blocks of the system and discusses the different structures of 
Wallace Tree. Chapter 6, gives the conclusion of the design and some suggestions about 
the future work. 6 
Chapter 2. Overall Structure of the System 
A general electronic system is a black box that performs a desired input/output 
transformation. A digital system has its input and output coded in binary format. Like 
any other digital system, our design accepts certain binary inputs and produces certain 
outputs. The inputs for the system are 64 unweighted bits in parallel at 156MHz. The 
output for the system is one multi-bit word at 20MHz. The transformation of this system 
(156M) involves  taking these  64 
20M 
= 1024  inputs,  multiplying them with the 
corresponding 20-bit 1024 coefficients, and adding the results together. The final output 
is a 30 bit words. Many times digital systems are advantageous because they can be 
partitioned into modules. Each module can be built with cells which act as the building 
blocks. The main modules of the system are: the input part, including the shift register 
block and the latch register block; the pre-processor part, including the XOR block and 
the AND block; and the addition part, including Wallace Tree and carry-look-ahead & 
carry-select block. 
Ilr  64 in parallel @ 156 MHz 
1024 Shift Register @ 156 MHz 
+ 
1024 Latch Register @ 20 MHz 
+ 1024 unweighted bit stream 
MUX and AND Matrix 
I 
+ 512 21-bit words 
positive Wallace Tree & Negative Wallace Tree 
wo negative 28-bit words i two positive 28-bit words 
carry-look-ahead & carry-select adder 
i30 bit @ 20 MHz 
Figure 2.1 Overall Structure for the System 7 
Figure 2.1 is the block diagram for the overall structure of the system. These blocks can 
be divided into three parts: the inputs part, the pre-processing part, the processing part. 
This chapter will describe the blocks in each part individually. 
2.1 Input Part 
The input part reads in the input data and converts them to certain data form that 
is appropriate for the data-processor of the system to deal with. Shift register block and 
latch register block are used in this part to achieve the goal. 
2.1.1  Shift Register. 
The inputs of the system are 64 unweighted bits at 156MHz in parallel. The 
system needs to take in 1024 input data, then do the data processing every 5Ons 
(@20MHz). The shift register block reads in the 64 parallel inputs every 6.4ns 
(@156MHz), shifts them down and stores them in 64 register levels. So the shift register 
block is composed of 64 levels of register cells. Each cell contains 64 DFF's in parallel. 
All the registers in this block work at 156MHz. 
2.1.2 Latch Register 
Since the clock for the output is 20MHz, the 1024 156MHz input needs to be 
latched at 20MHz and sent to the next stage in parallel. The latch register reads the 
outputs of the 64 x 64 matrix shift register every 5Ons, and sends them to the next stage 
in parallel. 8 
2.2 Pre-Processor Part 
The large number of data is a big obstacle for the design. Pre-processor makes 
use of the characteristic of the system and reduces the number of inputs before the main 
data-processor. By this, the size of the addition part can be reduced dramatically. The 
blocks in this part are the XOR block and the AND block. The XOR block reduces the 
number of input data entering the filter, the AND block does the multiplication for the 
filter. 
2.2.1 XOR 
The coefficients  of FIR filter  are  symmetric,  i.e. C(i) = C(n  i)  or 
antisymmetric, i.e. C(i) = C(n  i) . We can use this characteristic to try to reduce the 
number of data that  the processing part.  If the inputs  are symmetric,  i.e. 
!NW@ IN (n  i) = 0, the multiplications and the addition of the two multiplications 
will  give  2 x C(i)  or  2 x C(n  i).  If  they  are  anti-symmetric,  i.e. 
IN (i)  IN (n  i) = 1  , the addition of the two multiplications will produce a zero. 
Based on this, we add a XOR block before the filter itself to find if the coefficient 
is symmetric or anti-symmetric. This XOR block compares coefficients in parallel and 
drives the AND block. 
2.2.2 AND 
The multiplication is done by bit to bit AND-gates, which are equivalent to 2­
to-1 muxes. The input of the mux are individual bits of the coefficience x 2, whichare 
ready for the chip and the outputs of the XOR' s. 9 
2.3 Data Processor Part 
The Data Processor Part is the main part of the FIR filter. It contains the addition 
of all the outputs from the multiplications. It begins with the Wallace Tree structure, and 
uses the carry-look-ahead and carry-select adder to do the final addition of the output 
from the Wallace Tree. 
2.3.1 Wallace Tree 
Wallace Tree structure provides the structure for additions. Based on the 
manageability and speed trade-off, two basic counters (15,4) [15-to-4] and (3,2) [3-to­
2] are used. It is assumed that the coefficient of the filter are evenly distributed positive 
and negative numbers. 
Wallace Tree can not deal with the sign-bit. So the data is grouped into positive 
and negative group numbers, the sign bit information will be kept for the final addition. 
But practically, the sign of the coefficient of the filter is random. Duplicatation of this 
design and some minor changes can be done to achieve this. Section 5.4 discusses this 
in detail. 
The Wallace Tree compose several levels that can convert 1024 20-bit number 
to 4 28-bit numbers gradually. 
2.3.2 Carry-Select and Carry-Look-Ahead Adder 
The Carry-Select and Carry-Look-Ahead Adder takes four 28-bit output from 
Wallace Tree, along with the sign bits, and calculates the sum of these four 29-bit 10 
numbers. Carry-select and carry-look-ahead are used in this block to achieve higher 
speed. 
2.4 Summary 
The chapter briefly describes the overall structure of the digital filter. The 
detail functionary and performance will be described in later chapters. 11 
Chapter 3.  Full Adder 
3.1 Full Adder 
A full adder is a combinational circuit that forms the arithmetic sum of three input 
bits. It consists of three inputs and two outputs. Two of the input variables are the two 
bits needed to be added. The third input is the carry out from the previous lower weighted 
addition. These three inputs are equally weighted. Because the arithmetic sum of the 
three binary digits ranges in value from 0 to 3, two binary bits are needed for the outputs. 
One output has the same weigh as that of the three input bits, the other has a higher 
weight than that of the inputs. The truth table of the full adder is shown in Table 3.1. 
Table 3.1  Truth Table for Full Adder 
Inputs  Outputs 
A  B  C  S COUT 
0 0 0 0  0 
0 0  1 1 0 
0  1  0 1 0 
0 1  1 0 1 
1 0 0 0  1 
1 0  1 0 1 
1  1 0 0  1 
1  1  1 1 1 
The eight rows under the input variables designate all possible combinations of 
1's and 0's that these variables may have. The 1's and 0's for the output variables are 
determined from the arithmetic sum of the input bits. When all input bits are 0's, the 12 
C 
output is 0. The S output is equal to 1 when only one input is equal to 1 or when all three 
inputs are equal to 1. The C output has a carry of 1 if two or three inputs are equal to 1. 
So the truth table gives one simple definition about full adder: a full adder finds 
the number of ones in the inputs, and gives that number in binary form. Thus, it is 
actually a 3-to-2 counter. The schematic of a full adder using static CMOS 
complementary gate is given in Figure 3.1 
_J  Ac 
1 1 
Cci 
CARRYX  SUMX 
B
 
BH 
1 
Figure 3.1 Complementary CMOS Full Adder 13 
3.2 The Comparison of Static CMOS Adder and Transmission Gate Adder 
Besides the static CMOS full adder, a different implementation of adder uses a 
novel exclusive-or (XOR) gate. The schematic for this XOR gate is shown in Figure 3.2. 
L B 
A EB B 
Figure 3.2 Transmission Gate XOR 
With XOR, inverters and transmission gate, a full adder may be implemented as 
the Figure 3.3. The SUM (A Gs B  C) is formed by a multiplexer controlled by A ED B . 
If a second thought is given to the truth table of full adder, it can be found that Carry=Cin 
when A ED B = 1, and Carry=A (or B) when A ei B = 0 . 
This adder has 24 transistors, the same as the complementary one, but the Carry 
and Sum have the same delay. In addition, the Sum and Carry signals are noninverted. 
The disadvantage of this structure is that the delay is much worse than the static CMOS 
design according to 10. 14 
SUM I.  I
H 
L 
-C 
CARRY JL  0 I L.--4.-­ --1  7 1 
0 
L 
Figure 3.3 Transmission Gate Full Adder 
3.3 Static CMOS Full Adder Simulation 
There are several factors that affect the circuit performance. Simulations are done 
with regard to the effect of temperature, transistor size and power supply variations. 
3.3.1 The Affect of Temperature on Circuit Performance 
The circuit performance is largely decided by the I-V characteristic of the 
transistors used in the circuit. The delay, for instance, is decided by the charging  or 15 
2 
discharging current of the output node and the parasitic or real capacitance of that node. 
For the MOSFET used in this design, the I-V characteristic is: 
w  1 2 /d = K[(V -V )  V  V ds] L gs  t  ci_, s 0 <Vds<Vgs-Vt 
Vgs> Vt 
Id , =  K-1-1-7(17  V )2 2 L  gs  t  Vds> Vgs-Vt  Vgs> Vt 
Id = 0  V ds?-°  Vgs< Vt 
In the above equations, K = ii,Cox, where ti is the mobility of the carrier of the 
MOSFET, C Cox  is the capacitance density (capacitance per unit area). Both 1.t. and Vt 
are heavily temperature dependent, while they have opposite affect on the I-V 
characteristic.  i.t is the mobility of the channel. There are two collision or scattering 
mechanisms that dominate in a semiconductor and affect the carrier mobility: lattice 
scattering and ionized impurity scattering. For the channel mobility of MOSFET, lattice 
scattering is the main mechanism because of the low impurity concentration. The lattice 
scattering is related to the thermal motion of atoms. 
To the first order approximation, 
gaTED 
So the temperature coefficient for the channel mobility 1.t  is negative, i.e. the 
mobility increases as the temperature decreases. 16 
On the other hand, the threshold voltage is a function of temperature too. The 
absolute value of the threshold voltage decreases with an increase in temperature. This 
variation is approximately -4mv/°C for high substrate doping levels, and -2mv/°C for 
low doping levels. 
So the affect of temperature on the circuit depends on the combination of the 
two factors. In oder to get a rough feeling about the affect of these two factors, the fol­
lowing calculation is done to compare the time for a inverter's output node drop from 
5v to 4v under 25°C and -220°C .When the output of a inverter drops from 5v to 4v, 
the pmos is in cutoff region while the nmos is in its saturation region. The saturation 
current of the nmos provides the discharge current for the node. 
Isat =  (1 / 2) (W /L) K (V  V t)2 
Where: W/L = 3/2 
K = it  Cox = 84(11/V2) 
(V  V t) = 5  0.7 = 4.3 
So the /sat = 1200p.. 
For discharge current, At = (AQ)/I = (AV C)// , if the load capacitance is 
50fF (a typical gate capacitance for a inverter), the discharge time is approximately 0.04 
ns under room temperature. If the temperature is -220°C ,  according to the above 
discussion of temperature affect on Vt and  , the Vt will be approximately 1.5v, while 17 
84  (-3/2) the K will be  53 =  1234(µ/V2), so the resulting /sat = 11000µ, (-3 /2) 300
which will result in a decrease in discharging time about 10 times. 
From the above discussion, it is obvious that the temperature drop will speed up 
the circuit. Simulations use the following stimulus under different temperatures to verify 
the effect. 
Table 3.2 Stimulus to Simulate the Affect of Temperature on FA Delay 
ain  bin  cin  time point  sum  cout 
0  0  0  10 ns  0  0 
1  0  0  20 ns  1  0 
1  1  0  30 ns  0  1 
1  1  1  40 ns  1  1 
1  1  0  50 ns  0  1 
0  1  0  60 ns  1  0 
0  1  1  70 ns  0  1 
0  1  0  80 ns  1  0 
Table 3.3 Delay of FA with Minimized Transistor Sizes 
T=25 C  T=-220 C 
a (b)  cin  a (b)  cin 
up edge  0.97 ns  0.80 ns  0.68 ns  0.80 ns 
sumb 
down edge  0.21 ns  0.28 ns  0.11 ns  0.09 ns 18 
Table 3.3 Delay of FA with Minimized Transistor Sizes 
T=25 C  T=-220 C 
coutb 
up edge  1.38 ns  1.56 ns  1.40 ns  1.60 ns 
down edge  0.29 ns  0.35 ns  0.16 ns  0.18 ns 
From Table 3.3, the delay of the circuit does decrease while the temperature 
goes down. 
3.3.2 The Affect of Transistor Size on Circuit Performance 
The sizes of the transistors in the circuit affect the circuit performance because 
they affect the I-V characteristic of the MOSFET. Generally speaking, the large size 
gives larger current under the same nodes' voltages, which means a stronger driving 
ability. But a larger size transistor means a larger load for the stages that drive it. So the 
affect of the sizes for the transistors on the circuit performance is not a straight forward 
relation. 
Table 3.4 Delay of FA with Optimum Transistor Sizes 
T=25 C  T=-220 C 
a (b)  cin  a (b)  cin 
up edge  0.76 ns  0.65ns  0.68 ns  0.60 ns 
sumb 
down edge  0.14 ns  0.25 ns  0.10 ns  0.09 ns 
up edge  1.12 ns  1.22 ns  1.08 ns  1.30 ns 
coutb 
down edge  0.29 ns  0.25 ns  0.16 ns  0.11 ns 19 
The simulation results show that the speed of the full adder relies on the sizes of 
the transistors in the circuit. The "optimum size" taken here is from 1. The optimum may 
not be suitable for the circuit, but it does give some idea that the size of the transistor 
affects the circuit performance. A variety of software packages have been developed to 
aid in the optimization of transistor sizes. 
3.3.3 The Effect of Power Supply Variation on Circuit Performance 
In an IC chip, the variation of power supply is very normal and usually within 
the range of 10%. 
Generally speaking, the increase of power supply means large current for the 
same transistor, so that will make the circuit faster. Simulation is done using the same 
stimulus shown in Table 3.2 on page 17 while the VDD is changed from 4.5 to 5.5v. 
The simulation results are shown in Table 3.5. 
Table 3.5 The Delay of FA with Vdd Variation 
Out­
put  Edge  Input A or B  Carry Input 
4.5 V  5.0 V  5.5 V  4.5 V  5.0 V  5.5 V 
Sb  Up  1.13 ns  0.98 ns  0.85 ns  0.98 ns  0.80 ns  0.65 ns 
Down  0.24 ns  0.22 ns  0.17 ns  0.29 ns  0.28 ns  0.27 ns 
Cb 
Up  1.75 ns  1.39 ns  1.13 ns  1.95 ns  1.58 ns  1.30 ns 
Down  0.30 ns  0.29 ns  0.27 ns  0.28 ns  0.35 ns  0.30 ns 
Simulation results verify that the delay of the circuit decreases as the power 
supply goes up. 20 
3.4 Summary 
From the simulation and discussions in this chapter, the complementary CMOS 
full adder is the choice for the design. The average for this full adder is approximately 
0.8 ns using 0.8 um CK processing in SSI. The delay of the circuit decreases  as the 
temperature goes down. The optimization of the sizes for the transistors in the circuit 
will increase the performance of the circuit, which can be achieved by some software 
simulation. The power supply variation affects the circuit performance:  as the power 
supply goes up, the delay of the circuit drops. But with all the variations, the full adder 
used in the design can have a delay less than 1.2 ns, which is equivalent to 800 MHz. 21 
Chapter 4. Design and Simulation of D Flip Flop 
4.1 Sequential Circuits 
Sequential circuits have memories so that the output signals can be function not 
only of the present input signals but also of past ones. Often, in sequential circuits, 
output signals are fed back. Thus, an output signal can be a function, not only of past 
input signals, but also of the past output signals. So a sequential circuit  can be 
considered to consist of the interconnection of a combinational circuit and a memory. 
Sequential circuits are so named because they allow operations to be performed 
in sequence. Sequential circuits are usually slower than combinational circuits because 
the operations have to be performed in sequence. However, the modem large digital 
computer, or even most small computer applications must have memories to function 
properly. Thus, sequential circuits are of prime importance in modem digital devices. 
Sequential circuits are classified into two types, synchronous and asynchronous. 
In synchronous sequential circuits, the signals only change their values at discrete 
times, that is, they all change in synchronism. Pulses are generated by a device called a 
master clock. The master clock pulses synchronize the operation of all the devices 
within the digital device. In general, different types of digital circuits respond  at 
different rates. The rate at which the master clock generates pulses must be slow enough 
to permit the slower circuit to respond. This then limits the speed of all circuits. 22 
In an asynchronous sequential circuit, each device responds at its own states. 
Therefore, in general, asynchronous circuits are considerably faster than sequential ones. 
The system designed here is a synchronous sequential circuit with the input speed 
at 156MHz, while the speed for the overall circuit is 20 MHz. 
4.2 Flip-Flops 
A very basic sequential circuit is called a flip-flop or latch. This is a digital device 
whose output remains constant (i.e., either a 0 or 1) until it is switched in response to its 
input signal. This accounts for its name: that is, it flips or flops from one possible output 
to the other and remains there until flipped back, or, equivalently, latches at one output 
until changed by the input. 
There are different kinds of flip-flops. A general block diagram representation is 
shown in Figure 4.1. 
Outputs One or more
 
inputs
 
Q Q 
QB QB 
Figure 4.1 Block Diagram for a Flip-Flop. 
The value of the output marked Q is called the state of the flip-flop. That is, when 
Q=1, then the state is 1 and when Q=0, the state is 0. Flip-flops usually have two outputs, 
one equals the state and the other equals its complement. The complement output is 
marked as QB. 23 
Once the state of a flip-flop is set, it remains this way until changed by the input. 
Thus, a flip-flop "remembers" its inputs. 
4.3 D Flip-Flop 
The D (delay) flip-flop has only one input. The levels of a clock, CLK, are used 
to drive the D flip-flop (DFF) to either the storage state or the input state. If D is the input 
signal, Q and Q' are the CLK and CLK , the state equations for positive and negative 
level-sensitive latch can be expressed as 
Q' = D CLK + Q CLK 
and  Q' = D CLK + Q CLK 
The first equation describes a latch which passes the input data when CLK  = 1 
and stores  it when CLK = 0.  Inversely,  the second equation describes  a 
complementary latch, which receives input data at CLK = 0 and stores  it  at 
CLK = 1 . 
If two complementary latches are connected in series, one will be in the storage 
state while other is in the input state and a 'non-transparent' edge triggered flip-flop is 
formed. One well-known structure is called the master-slave D flip flop. The master 
stage reads in the input state when the clock is low (high), while the slave stage passes 
the input state to the output when the clock is high (low). 
One structure that makes use of the transmission gate is shown in Figure 4.2. As 
can be seen from the circuit, the master and slave stage will remain isolated as long as 24 
the clk 1 and clk2 are not high simultaneously. The negative feedback loop keeps the 
state of the master and slave stage latched as long as the clock does not have another up-
edge or down-edge. At the up-edge of the clock, the master stage reads the input, while 
the slave stage passes the input state to the output at the down-edge of the clock. So this 
kind of flip-flop is called edge-trigger DFF. 
clacl  clk2 
OUT 
clk2  clkl x 
Figure 4.2  Static CMOS D FLIP FLOP 
4.4 CMOS Static DFF Simulation 
4.4.1 The Comparison of Three Kinds of Transmission Gate DFF 
The transmission gate in the above structure can be a single NMOS or single 
PMOS transistor or CMOS transmission gate. Due to the working characteristics of 
these three kinds of gates, the working performance and the highest working frequency 
for them will be very different. 25 
In order to find out the working characteristics for NMOS, PMOS and CMOS 
transmission, the following setups are used. 
CLK 
CILK (1) 
VINP  Ir_40UTP
 
VIN
 
CLK
 
VINN  HVOUTN
 
Figure 4.3 Test Circuit for Different Kinds of Transmission Gates 
For NMOS transmission gate, when the CLK = 0 ,  the gate is open, the 
VOUTN keeps its old state. When CLK = VDD , the gate is closed, the stable voltage 
for VOUTN depends on the voltage level of VINN. If VINN = 0, the resulting 
VOUTN = 0 ; if VINN = VDD , the resulting VOUTN = VDD VT. Most of the 
processing are n-well processing, so the bulk node for NMOS can only tied to Ov, which 
means the VBS = VDD V = 4.3v this VBS will form a negative feedback on the 
j ..Ts) , where vto  = 0.7v , VINN (body-effect). vt = vg) + 1/2 G 
27 /Vbs + 2 
= 0.3v . So the resulting Vt = 1.5v , this will pull the VINN =3.5v. So the NMOS 
transmission gate can only transfer voltage between 0 to 3.5v. If this kind of 
transmission gate is used in the circuit, the noise margin for the next stage will be 
largely reduced. An even worse thing is that if these kinds of transmission gates were 26 
used in series, this body-effect would become worse and worse along the transmission 
gate chain, and ultimately kill the initial signal. 
For PMOS transmission gate, this body-effect can be eliminated by connecting 
the bulk node of PMOS with its source node. This is because most of the processing are 
n-well process, so the bulk node for each PMOS can be tied to its own voltage instead 
of tied together to certain voltage. So the PMOS transmission gate has better voltage 
performance than NMOS. 
The second factor that must be taken into consideration when choosing the 
transmission gate is to find out the highest working frequencies for them. In order to do 
this, different kinds of clocks are put into the DFF's composed by each of these three 
kinds of transmission gates, increase the clock frequency gradually until it fails, while 
the frequency ratio of the clock to input signal is kept at 2:1. 
The result of simulation is summarized in Table 4.1 
Table 4.1 Speeds of DFF With Different Transmission Gate 
Gate Type  CMOS  NMOS  PMOS 
Highest 
358M  357M  100M Frequency 
The simuation shows that only the CMOS and NMOS transmission gates have 
the required working frequency. Considering the voltage level problem of the NMOS 
transmission gate, the CMOS transmission gate is the best choice for the design. 27 
I 
4.4.2 Working Performances with Different Sizes of Transistors 
The working performance of the circuit is largely affected by the sizes of the 
transistors. In CMOS static circuits, a larger size transistor gives a larger current under the 
same gate voltages, which means stronger driving ability. But a large size transistor also 
means large load capacitance for the previous gate. Simulation is done with the design 
using minimized size transistors. Based on the waveform of the internal nodes, we 
increase the sizes of the driver transistors for the slower nodes, while keeping the faster 
node driver transistor at the minimum size. The optimum sized DFF can work up to 
516MHz, while the minimum sized DFF can work up to 356 MHz. 
VCLK 
-C  6/2	  C 
J 8/2 L	  10/2
 
8/2
 >.710/21
VIN 
4/2 0 
VCLKB 
J L D- 10/2  6/2 
8/2 
4/2 
Figure 4.4 CMOS Static DFF with Improved Size Transistors 28 
4.4.3 Timing Simulation About DFF 
When using DFF, it is necessary to understand the timing of the various signals 
used in DFF. This is especially true when different components are interconnected. 
If gates are to function properly, then the clock signal must meet certain 
requirements. There are two such important specs about DFF, the setup time and the hold 
time. 
The setup and hold time of a register are the deviation from an ideal register 
caused by finite circuit delay. 
The setup time is the delay between the 50% whole voltage point of the incoming 
signal and the 50% whole voltage point of reading edge of the clock, i.e. the input must 
be set up at the 'setup time' before the reading clock edge. 
The hold time is the delay between the 50% whole voltage point of the reading 
edge of the clock and the 50% voltage point of the changing edge of the input, in order 
to get the input read in the DFF correctly. 
From the view of charge and discharge, setup time is the time to chargeup (or 
discharge) the input node to the right input state. If the input driver is an ideal voltage 
source, there should be no time needed to charge (or discharge) that node up. But in real 
circuits, this is the time for the driver which drives the DFF to charge (or discharge) it 
output node to the right state before the reading edge of the clock for the DFF. Hold time 
is the time to charge (or discharge) the internal node of the DFF to the right state, so it is 29 
determined by the charge (or discharge) current and the parasitic capacitance of the 
internal nodes within the DFF. 
If the data of a register does not obey the setup and hold time constraints,  a 
potential clock race problem may occur. This race results in erroneous data being stored 
in the register. 
In order to simulate the setup time for the DFF, the input and clock in Figure 4.5 
are used: 
Input  _______I 
Clock 
Figure 4.5  DFF Clock & Input Relation 
The simulation is done by moving the up-edge of the input step-by-step towards 
the up-edge of the clock, until the operation fails. 
The simulation shows the setup time for the circuit is Ons for both T=25° C and 
T.-220°C , assuming that the node capacitance at the input point is 50 fF. 
In order to get the hold time for the circuit, the down-edge of the input is pushed 
backwards to the up-edge until the circuit fails to read in the right input. 30 
Simulation gives a hold time to be 2.3ns for T=25° C and 2.4ns for T.-220 ° C, 
assuming the output node load is 50 fF. 
4.5 Summary 
This design uses CMOS static DFF. It has Ons setup time and 2.3ns hold time. The 
DFF with PMOS transmission gate has a much lower speed than that with NMOS  or 
CMOS transmission gate. The DFF with NMOS and CMOS can work up to 350MHz 
using 1.1 micro CMOS process, but the NMOS transmission gate has a very serious body-
effect that can't be eliminated in n-well processing. So the CMOS transmission gate is 
chosen for the design. The size of the transistors in the DFF affects the performance of 
the circuit. But the optimum size isn't available due the limitation of tools. 
Consider a digital system containing many master-slave flip-flops, with the 
outputs of some flip-flops going to the inputs of other flip-flops. Assume that the clock 
pulse inputs to all flip-flops at the same time. At the beginning of each clock pulse, some 
of the master elements change states, but all the flip-flop outputs remain at their previous 
values. After the clock pulse returns to 0, some of the outputs change state, but none of 
these new states has any affect on any of the master elements until the next clock pulse. 
Thus the states of flip-flops in the system can be changed simultaneously during the same 
clock pulse, even though outputs of flip-flops are connected to the inputs of flip-flops. 
This is possible because the new states appear at the output terminals only after the clock 
pulse has returned to 0. Therefore, the binary content of the second is transferred to the 
first, and both transfers can occur during the same clock pulse. 31 
Chapter 5. Function Blocks 
The system is composed of the following function blocks: Shift Register, Latch 
Register, Wallace Tree and Carry-look-ahead & Carry-select Adder. 
5.1 Shift Register and Latch Register 
The first block for the system is 1024 shift registers. This block latches the 64 
unweighted bits in parallel for 8 times. So it comes as the matrix of 64 x 8 form. The 
inputs of the DFF work at 156MHz. 
The block that follows the shift register block is the latch register block. This 
block takes the 8 rows of 64 register output, converts them to 1024 bits in parallel. The 
input frequency for this block is 20MHz. 
The schematic of this block is shown on Figure 5.1. 
5.2 Wallace Tree 
When three or more operands are to be added together, the speed of the 
traditional carry-ripple adder is restricted by the carry ripples between bits. When the 
number of bits is large, the traditional carry-ripple adder becomes so slow that it can not 
be used in a speed sensitive system. Several techniques for these kinds of multiple 
operand addition that attempt to lower the carry-propagation penalty have been 
proposed and implemented. The technique that is most commonly used is carry-save 
addition. In a carry-save adder (CSA), the carry propagation is only allowed in the last 
step, while in all the other steps a partial sum and a sequence of carries are generated. 32 
Therefore, a CSA is capable of reducing the number of operands to be added from 3 to 
2, without any carry propagation. 
input 
644/ 
Shift 
> Shift 
.  > Shift 
> Shift 
64 
Latch 
Register 
1024 bit stream 
@ 20 MHz 
--> Shift 
> Shift 
Clock --> Shift 
156Mhz 
Shift 
64 
20 MHz 
Figure 5.1 Shift Register and Latch Register Block 33 
5.2.1 Carry-Save Adders 
A carry-save adder can be implemented in several different ways. In the 
simplest implementation, the basic element of the carry-save adder is a full adder with 
three inputs, x, y, and z, whose arithmetic operation can be described by: 
x2i+y2i+z2i = c2i+1+s2i 
where 
x, y and z are the inputs of CSA with 2i weight 
c and s are the outputs of CSA with 2i + 1 and 2i weight 
It is obvious that the full adder is a (3,2) CSA or counter. Besides (3,2) counter, 
there are (7,3) and (15,4),... counters. CSA can also be put in another way, that is CSA 
counts the number of ones in the same weighted multi input and output the counted 
number in binary data form. 
Figure 5.2 gives some real number examples for these kinds of CSA. 34 
1 
1 
0 
1 
1  0 
0 
1  0 
1  0  0 
1  1 
1 
0  1 
1  0 
0 
1 
1 
0 
1 0 0  1 
1 
(3,2)  (7,3) 
1 
1 
(15,4) 
0 1  1  0 
Figure 5.2 Examples of Single Bit CSA 
The structures for these basic counters are shown in Figure  5.3.  The real 
implementations for these counters are shown in Figure 5.4 and Figure 5.5. 
I 
OEM 
(3,2) counter  (7,3) counter  (15,4) counter 
Figure 5.3  Counters 35 
Figure 5.4 (3,2) and (7,3) Counter Structures 36 
IN [0 14] 
\+/ 
1 
1  c2 
C i: ith weight 
Delay: Five FA Delays 
Figure 5.5 (15,4) Counter Structure 37 
5.2.2 Wallace Tree 
If the operands for the CSA or counters are multi-bit words, a way to organize 
the operations is a tree commonly called Wallace Tree. Similar to the single bit CSA, the 
multibit CSA counts the number of ones in the same weight bits separately without 
caring about the carry ripple from the lower weighted bits. Figure 5.6 is an example of 
Wallace Tree dealing with real multi-bit numbers. 
1  61  0  9 
0 :1  1 
I I  7 
I
 
1  al 0  8 
11111
 
D.  rl 101  24 
1  o 
1 1  o o o 
Real Number Form Binary Form 
Figure 5.6 Example of Multi-Bit Wallace Tree 
Figure 5.7 gives a three 20-bit Wallace Tree, three 20-bit number can be reduced 
to two 21-bit numbers in one full-adder delay. Figure 5.8 shows a 20-bit (7,3) Wallace 
Tree, seven 20-bit numbers can be reduced to two 22-bit numbers in four full-adder 
delays, Figure 5.9 shows a 20-bit (15,4) Wallace Tree, fifteen 20-bit numbers can be 
reduced to two 22-bit numbers in seven full-adder delays. 
41 40 40 40 41 41 41 40 40  41 40 2) C) 41 10 C) 40
 
41 41 IP II II 40 40 41 41 41 41  41 2) 0 40 10 C) 41
 eoso 
41 41 41 41 41 41 41 41 41 40 41 41 C)  0 0 41 40
 
41 41 4, 40 40 40 41 40 41  0 2) II 41 0 ()
 
Three 20-bit words IN.-Two 20-bit words 
Figure 5.7 Three 20_bit Wallace Tree Structure 3
8
 
6
)
0
0
0
6
 
6
t
o
e
o
o
 
&
0
0
0
0
 
0
0
e
)
0
0
 
0
0
0
0
0
 
&
0
0
0
6
0
0
0
0
0
0
 
&
0
e
0
0
0
0
0
0
0
 
0
0
0
0
0
 
®
0
0
0
0
 
9
 
0
0
0
0
0
0
&
0
0
0
0
0
0
®
0
0
0
 
9
4
2
0
e
)
0
4
6
4
0
0
0
0
0
6
0
0
0
0
0
 
0
0
0
0
0
0
&
o
e
o
o
e
0
0
0
0
 
0
0
0
0
0
0
 
9
0
0
e
)
0
0
 
0
0
0
0
0
0
0
 
9
®
0
0
0
0
0
 
o
e
l
e
e
o
e
e
e
e
)
 
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
 
4
1


 
4
1


 
4
1


 
4
1


 
4
1


 
4
1


 
6


 
4
0


 
f
0


 
4
0


 
6


 
4
0


 
4
0


 
4
1


 
6


 
0


 
4
1


 
4
0


 
4
1


 
4
0


 
I
D


 
4
0


 
4
0


 
4
0


 
4
1


 
4
0


 
4
0


 
4
1


 
4
0


 
4
0


 
4
1


 
4
1


 
4
1


 
4
0


 
4
1


 
4
1


 
4
0


 
4
1


 
4
1


 
w
o
r
d
 
2
2
-
b
i
t
 
o
n
e
 
a
n
d
 
w
o
r
d
 
d
e
l
a
y
s
 
2
0
-
b
i
t
 
a
d
d
e
r
O
n
e
 
S
t
r
u
c
t
u
r
e
f
u
l
l
 
f
o
u
r
 
T
r
e
e
 
w
o
r
d
s
-
p
.
­
i
s
 
2
-
b
i
t
 
d
e
l
a
y
 
W
a
l
l
a
c
e
 
S
e
v
e
n
 
T
o
t
a
l
 
2
0
-
b
i
t
 
S
e
v
e
n
 
F
i
g
u
r
e
 
5
.
8
 39 
QD C2 GO 0 0) 0 0 QD 0) CO GO QD 0) 0 GO 41 Q5 ID 6) 40
 
CO 0 0 0 0 0 0 0) C2 GO 0 0) C2 0 QD 41 0 40 0 41
 
0 0 0) C2 0 0 0) 0 0 0 0) C2 GO QD 0) 41 e 40 0 41
 
0 0) CO GO 0 0) 0 GO QD 0) CO 0 0 0) 0 40 0 41 E) 41
 
OD C2 GO Q) Q) C2 0 QD 0) CO 0 0 0) 0 GO 41 Q5 4, 6) 40
 
0 0 0 QD C2 GO QD 0) CO 0 0 0) CO 0 0 40 0 41
  40
 
0 0 0) 0 GO 0 0) 0 GO 0 0) 0 GO QD 0) 40 e 40 0 41
 
0 QD f2 0 0 0) 0 GO 0 0) 0 GO 05 0) CO 40 05 41 E) 41
 
OD f2 GO 0 0) 0 GO 0 0) 0 GO 0 0) C2 GO II 0 41 6) 40
 
0 0 0 Q) C2 GO QD 0) C2 GO 0 0) 0 GO 0 4, 0 41 0 41
 
0 0 0) C2 0 0 0) 0 GO 0 0) 0 GO QD 0) 40 e 41 C5 41
 
0 0 6 0 0 0) C2 GO 0 0) CO GO 0 0) 0 40 0 0 E) 10
 
0 0 GO 0 IQ) 0 GO 0 0) 0 GO 0 0) 0 GO 11 0 40 & 41
 
o GO 0 0) C2 GO 0 0 0 GO 0 0) 0 GO 0 41 0 41 0 41
 
0 0 0) C2 GO 0 0) C2 GO 0 0) 0 GO 0 0) 40 e 41 0 41
 
0) CO GO 0 o GO 0 QD f2  0 0)  :,) 41 41 41 41
 
0 0 GO 0 0) 0 0 0 C2 GO 0 QD CO 0 0 0) C5 SI 6) C5
 
40  Q) C2 GO 0 0) C2 GO 0 0) G7 GO 0 0) G7 40 lb 41 40
 
GO 0 0) C2 GO GO 0) e2 co QD 0) 0 GO 0 0) C2 0 0
 
GO QD 0) CO GO QD 07 0 GO 0) C2 GO 0 QD CO GO GO QD CO G7
 
OD CO GO 0 0) CO GO QD C2 GO Q) 0 0 GO GO el f2 GO GO 0) C2 0
 
0 0 0 0) 0 GO 0  GO GO 0) f2 0 0 0) C2 GO 0 QD f2
 
41  41 41 41  40 41 41 41 40 41 41 6 41 41 40 41 6 41 41 41 41 41 41 41
 
41 41 40 41 41 41 10 41 41 41 41 41 41 10 40 41 41 40 ID 41 40 lb
 
Fifteen 20-bit Words --pp- Two 24-bit Words 
Total delay: 5+1+1=7 Seven Full Adder Delays 
Figure 5.9 Fifteen 20-bit Wallace Tree Structure 40 
In Wallace Tree, the number of operands is reduced by a factor of 2/3 at each 
level if the basic (3,2) counter is used. Consequently, 
log ( k/ 2) Number of levels  log(3/2) 
This equation only provides an estimation of the number of levels. If different 
kind of basic counters is used, the level number may vary. 
5.2.3 Sign-Bit in Wallace Tree 
Wallace Tree is very efficient to reduce the number of additions. The basic 
element for this structure is the counter, which causes some problems. As indicated by 
the name, counters only count the number of ones in the inputs, so they won't take care 
of the sign bits. 
So if the counters are used to deal with signed numbers, some pre-separation 
has to be done to separate the positive and negative numbers. Then the counter can be 
used to deal with each group separately. 
The results are four numbers, two of them are positive, two of them are negative. 
These numbers should be reunited with their sign bit information to recover the right 
numbers. 
For this design, the inputs of the Wallace Tree are signed binary numbers. For 
simplicity, it is assumed that the inputs are evenly positive and negative. So the Wallace 
Tree is divided evenly into two parts, positive part and negative part. But this is seldom 
the case. The worst case is all the numbers are positive or all are negative. 41 
The following adjustments can be done in order to make the circuit work in that 
case: 
1.	  Use XOR at each of the inputs to decide which block, positive or negative block, the 
input should be sent to. Meanwhile send a zeros to the non-chosen block. 
2.	  Duplicate the structure in both positive and negative blocks to make sure the circuit 
in each block can deal with 512 numbers instead of 256 ones. 
Due to the duplication, the Wallace Tree will need two more three 20-bit to two 
21-bit layers at the end of the positive and negative blocks. 
So Wallace Tree can be used for the addition of signed numbers at the price of 
larger hardware implementation. 
5.2.4 Structure Trade-Off for Wallace Tree 
When parallel additions are done, there are several basic circuit elements to 
choose from, (3,2), (7,3), (15,4), etc. How to choose the basic element is a trade-off 
between the simplicity of the circuit and the total time required to finish the calculation. 
As shown in Figure 5.10, if the (3,2) counter is used instead of the (15,4) counter as the 
basic element, the total delay is only 6 full adder delays instead of 7 full adder delays. 
But as also seen from the two structures, using (15,4) as the basic element makes the 
structure simpler and easier to manage. 
In this design, two kinds of basic elements are used. (15,4) is used in the top level 
of the Wallace Tree to reduce the large number of inputs, while (3,2) is used for the 
following levels when the number of additions is reduced to amore manageable number. F
i
g
u
r
e
 
5
.
1
0
 
F
i
f
t
e
e
n
 
2
0
-
b
i
t
 
W
a
l
l
a
c
e
 
T
r
e
e
 
w
i
t
h
 
(
3
,
2
)
 
t
o
 
b
e
 
t
h
e
 
B
a
s
i
c
 
E
l
e
m
e
n
t
0
)
0
0
0
0
 
G
O
 
0
0
)
0
2
0
 
0
)
 
0
 
G
O
 
O
D
 
0
 
G
O
 
G
O
 
0
)
0
0
)
 
0
 
0
 
0
 
0
 
G
O
 
0
 
0
)
 
0
 
2
 
0
 
0
)
 
0
 
G
O
 
O
D
 
0
 
G
O
 
G
O
 
0
)
 
0
F
i
g
u
r
e
 
5
.
1
0
 
F
i
f
t
e
e
n
 
2
0
-
b
i
t
 
W
a
l
l
a
c
e
 
T
r
e
e
 
w
i
t
h
 
(
3
,
2
)
 
t
o
 
b
e
 
t
h
e
 
B
a
s
i
c
 
E
l
e
m
e
n
t
4
2
4
2
 
1
 
4
1
 
4
1
 
4
0
 
4
0
 
4
1
 
4
0
 
1
0
 
4
0
 
4
0
 
4
1
 
1
0
 
4
0
 
4
1
 
4
1
 
4
0
 
4
1
 
4
1
 
4
1
 
4
1


 
G
O
 
0
 
0
 
0
 
0
 
0
)
 
0
 
G
O
 
0
 
0
)
 
0
 
0
 
0
 
0
)
 
0
 
G
O
 
0
 
0
)
 
0


 
0
 
0
 
0
 
0
 
0
 
0
)
 
0
 
2
 
0
 
0
)
 
0
 
G
O
 
G
O
 
0
)
 
0
 
2
 
0
 
O
D
 
0
 
2


 
0
 
0
 
G
O
 
0
 
O
D
 
0
 
G
O
 
0
 
0
)
 
0
 
G
O
 
0
 
0
)
 
0
 
G
O
 
0
 
0
)
 
0
 
G
O
 
0


 
4
0
 
i
t
 
4
i
 
4
0
 
4
1
 
4
1
 
4
0
 
f
t
 
1
0
 
4
0
 
4
0
 
4
0
 
i
t
 
4
i
 
4
0
 
4
0
 
4
1
 
4
0
 
4
0
 
4
0
4
0
 
i
t
 
4
i
 
4
0
 
4
1
 
4
1
 
4
0
 
f
t
 
1
0
 
4
0
 
4
0
 
4
0
 
i
t
 
4
i
 
4
0
 
4
0
 
4
1
 
4
0
 
4
0
 
4
0


 
g
o
 
4
1
 
0
1
 
4
0
 
4
0
 
4
1
 
4
P
 
4
0
 
4
0
 
4
1
 
I
P
 
4
0
 
4
1
 
0
1
 
4
0
 
4
1
 
4
1
 
I
P
 
4
0
 
4
1
g
o
 
4
1
 
0
1
 
4
0
 
4
0
 
4
1
 
4
P
 
4
0
 
4
0
 
4
1
 
I
P
 
4
0
 
4
1
 
0
1
 
4
0
 
4
1
 
4
1
 
I
P
 
4
0
 
4
1


 
i
t
 
4
1
 
4
0
 
4
0
 
4
i
 
4
0
 
4
0
 
I
t
 
4
1
 
4
0
 
4
0
 
4
1
 
4
0
 
4
0
 
i
t
 
4
1
 
4
0
 
4
1
 
i
t
 
4
1
i
t
 
4
1
 
4
0
 
4
0
 
4
i
 
4
0
 
4
0
 
I
t
 
4
1
 
4
0
 
4
0
 
4
1
 
4
0
 
4
0
 
i
t
 
4
1
 
4
0
 
4
1
 
i
t
 
4
1


 
f
t
 
4
0
 
0
 
4
1
 
4
0
 
g
o
 
4
1
 
4
1
 
4
0
 
4
0
 
4
1
 
I
P
 
4
0
 
4
0
 
4
1
 
4
0
 
4
0
 
4
0
 
f
t
 
4
0
f
t
 
4
0
 
0
 
4
1
 
4
0
 
g
o
 
4
1
 
4
1
 
4
0
 
4
0
 
4
1
 
I
P
 
4
0
 
4
0
 
4
1
 
4
0
 
4
0
 
4
0
 
f
t
 
4
0


 
e
)
 
e
)
 
c
)
e
)
 
e
)
 
c
)
 
e
)
 
e
)
e
)
 
e
)
 
e
)
 
o
 
c
)
e
)
 
o
 
c
)
 
e
)
 
e
)
 
e
 
e
 
e
)
 
c
)
e
)
 
e
)
 
e
 
e
 
e
)
 
c
)
 
0
®
0
®


 
e
)
 
e
)
 
e
e
)
 
e
)
 
e
 
e
)
 
e
)
e
)
 
e
)
 
e
)
 
e
)
 
(
)
e
)
 
e
)
 
(
)
 
e
)
 
e
)
 
c
)
e
)
 
e
)
 
c
)
 
e
)
 
o
 
0
0
e
)
 
o
 
0
0


 
4
1
 
4
1
 
4
0
 
4
1
 
4
D
 
4
1
 
4
1
 
4
1
 
4
1
 
4
0
 
l
b
 
4
1
 
4
1
 
4
1
 
4
1
 
4
1
 
4
0
 
4
1
 
4
1
 
4
1


4
1
4
1
 
4
0
4
1
 
4
D
 
4
1
4
1
 
4
1
 
4
1
4
0
 
l
b
 
4
1
 
4
1
 
4
1
 
4
1
 
4
1
 
4
0
 
4
1
4
1
 
4
1


 
4
1
 
4
0
 
4
1
 
4
1
 
4
1
 
4
1
 
4
1
 
4
1
 
4
1
 
4
0
 
4
0
 
4
1
 
4
0
 
4
1
 
4
1
 
4
1
 
4
1
 
4
1
 
4
0
 
4
1


4
1
 
4
0
4
1
4
1
4
1
 
4
1
 
4
1
4
1
 
4
1
 
4
0
4
0
 
4
1
 
4
0
 
4
1
 
4
1
 
4
1
 
4
1
 
4
1
 
4
0
 
4
1


 
0
)
 
0
 
0
 
0
 
0
 
0
 
0
 
0
)
 
0
 
0
 
0
 
0
 
0
 
G
O
 
O
D
 
0
 
G
O
 
0
 
0
)
 
0


0
)
0
0
0
 
0
 
0
0
 
0
)
0
0
 
0
 
0
 
0
 
G
O
 
O
D
 
0
 
G
O
 
0
 
0
)
0


 4
0


 
4
1
 
4
1
4
1
 
4
1


4
1


 
4
1
4
0
 
4
1
4
1
 
4
1
 
4
1
4
1
 
4
1
 
4
1
 
4
1


 
F
i
g
u
r
e
 
4
1
 
4
1


 
5
.
1
0
	
 
4
1
4
1
 
4
1
4
1
 
4
0
 
4
1
 
4
1
 
4
1
	
4
1
 
6
4
1
 
4
1
 
4
1
4
1
 
4
1


 
F
i
f
t
e
e
n
 
0
 
4
0
4
1
 
4
1
4
1
 
4
1


 
4
1
 
4
1
 
4
1
	
4
1
0
 
4
1
4
1
 
4
1
0
 
4
1
4
1
 
4
0
4
1


 
0
 
4
1
4
0
 
4
1
4
1


4
1
 
4
1


4
1
 
4
1
4
1
	
0
 
4
1
4
1
 
4
1
 
4
0
 
4
1
4
1
 
4
0
4
0
o


4
1


 
2
0
-
b
i
t
 
0
 
4
1
4
1
 
4
,
 
4
1


4
1
 
4
1
 
4
1
	
4
1
0
 
4
1
4
1
 
4
1
 
4
0
4
1
 
4
1
4
1
o


 
C
o
n
t
i
n
u
e
d
 
0
4
1
4
1
 
4
1
4
1
 
4
1


4
1
 
4
1
4
1
	
0
 
4
1
4
0
 
4
1
o
 
4
1
4
1
 
4
1
4
1
0


4
1


w
o
r
d
s
 
T
o
t
a
l
 
0
 
4
0
4
1
 
4
1
4
1
 
4
1


4
0
4
1
 
4
0
	
4
1
O
D
 
4
0
4
1
4
1
4
0
 
4
1
4
1
4
1
4
1
0


 
o
 
4
1
4
1
 
4
1
4
1
 
4
1


4
1
4
1
 
4
1
4
1
0
 
4
1
4
1
4
,
 
4
1
4
1
4
1
4
1
0
 
0


d
e
l
a
y
 
4
0
4
1
 
4
1
4
1
 
4
1


i
s
 
4
1
4
1
 
6
4
,
4
0
 
4
1
4
1
4
1
0
 
4
1
4
1
4
1
4
1
0
 
s
i
x
 
O
n
e
 
4
,
 
4
1
0
 
4
1
4
1
4
1
0
 
4
1
4
1
4
1
4
1
0
 
4
1
4
0
4
1
4
0
4
1
4
1
4
1


 
0
0
4
0
4
1
 
4
1
4
1
 
4
1
4
1


4
1
I
D
 
4
1
	
4
1
0
 
4
1
4
0
4
1
4
0
 
4
1
4
1
4
1
4
1
i
)


f
u
l
l
 
2
1
-
b
i
t
 
4
1
0
 
4
1
 
Q
)
	
 
0
 
4
)
4
1
4
1
4
1
4
1


4
1
 
4
1
 
4
1
 
4
1
 
4
1
 
4
1
4
1
 
4
1
4
1


 
4
1
4
1
 
4
0
4
0
 
4
1


 
a
d
d
e
r
 
w
o
r
d
 
4
1
 
4
1
 
6
4
1
 
4
0
 
4
1
4
1
 
4
1
0
 
4
1
4
1
 
4
1
4
1
0
 
4
1
4
,
 
4
0
4
1
 
4
1
4
1
6
 
4
1
4
1
	
0
 
4
0
4
1
6
0
 
4
1
4
1
 
4
1
4
1
0


4
1


a
n
d
 
0
0
0
0
 
4
1
4
0
 
4
1
4
1
 
4
1
4
1


 
d
e
l
a
y
s
 
4
1
 
4
1
 
4
1
4
1
 
0
 
4
1
4
1
 
4
0
 
4
0
 
4
1
4
,
 
4
,
	
4
1
e
 
o
n
e
 
6
4
1
 
4
1
1
1
0
 
4
1
4
0
 
4
1
 
t
t
 
4
1
4
1
 
4
/
4
1
o
	
 
0
 
4
1
4
0
 
6
4
/
 
4
1


 
0
 
4
1
4
1
 
4
0
4
1
 
4
0
4
1


4
0
4
1
 
4
0
 
4
1
4
1
 
4
1
0
 
4
1
4
1
 
4
1
4
0


4
1
 
4
1


 
2
2
-
b
i
t
 
4
1
 
4
1
 
4
1
4
1
 
0
 
4
1
4
1
 
4
1
0
 
4
1
4
1
 
4
1
4
1
0
 
0
 
4
1
4
1
 
6
4
1
 
4
0


 
4
6


 
4
1
 
4
1
4
1


6
 
4
1
4
1
0
 
4
1
4
1
4
1
 
4
1
4
1
4
1
4
1
0


 
w
o
r
d
s
 
4
1
4
1
 
4
0
4
1


4
1
 
4
1


4
1
4
1
	
 
4
1
0
 
4
1
4
1
4
1
 
4
1
4
1
4
1
4
1
0


 
0
0
0
	
 
4
1
4
1


4
0
 
4
1
4
1
 
4
1


4
1
4
1
	
 
4
0
4
0
 
4
1
 
4
1
4
1
6
0


 
4
1
0
 
4
1
 
0
o
0
 
4
1
0
0
 
4
1
 
4
1


 
4
1


 
4
3
 44 
5.3 Carry-Look-Ahead and Carry-Select Adder 
The outputs of the Wallace Tree are two positive 28-bit numbers and two 
negative 28-bit numbers. In order to add them to one 30-bit number, a 4-bit carry look-
ahead and carry-select adder is used. 
c(n-1)=0 
Four bits 
Carry-look-ahead 
C (n) 
Four bits 
carry-select 
S (n) I. 
Carry-look-ahead 
tc(n-1)=1 
Figure 5.11 Carry-Look-Ahead & Carry-Select Adder 
In Figure 5.11, one of the 'Four bits Carry_look_ahead' does the addition 
assuming the carry input from the previous bit is 0, the other one does the addition 
assuming the carry input from the previous bit is 1, then the carry-select selects the right 
output when the previous carry reaches this block. 
5.4 Summary 
The input part of the system: shift-register and the latch-register blocks, converts 
the inputs to the data form that can be dealt by the system. Wallace Tree does the large 
amount of additions for the system, with the help of the pre-processing part to deal with 
the sign bit. The cost of this may be twice the hardwares for the realization. Traditional 
adder is still needed at the end of the Wallace Tree to get the final single multi-bit word. 45 
Chapter 6. Conclusion 
6.1 Conclusion 
From the design and the simulation, 1024 data points enter the designed FIR filter 
in the form of 64 bits in parallel at 156 MHz, and the system transfers the information 
into one single 30-bit word at 20MHz. 
Since the data flow for the system is all pipelined, the highest speed the system 
can attain is 1/(one FA delay), which is about 300MHz. 
There are about 2100 DFF, 1024 XOR, 1024 AND, 13000 FA and 32 4-bit carry-
look-ahead adder. The estimated area for the gate layout is about 65 mm2. If considering 
the pads and the routing area to be twice the area of the transistor area, the total die size 
will be around 100 mm2. 
6.2 Future Work 
This design is based on the assumption that all the coefficients are un-signed, if 
the coefficients are signed binary numbers, the whole structure can be duplicated to do 
that. 
This design uses some (7,3) (15,4) to do the Wallace Tree; the speed can be 
further increased by using (3,2) to do the Wallace Tree. This will require more detailed 
design of more circuit levels. 46 
Based on the tree structure of Wallace Tree, the basic floorplan can take the square 
form, with the input at the outerside of the chip and have the output generated at the center 
of the chip. The chip area for the successive levels shrink in accordance with the shrinkage 
in the amount of data or adding for each levels. 47 
Bibliography 
1	  West, Neil H. E. and Eshraghian, Kamran. 1992. Principles of CMOS  VLSI 
Design, Addison-Wesley Publishing Company. 
2	  Waser, Shlomo and Flynn, Michael J. 1982. Introduction to Arithmetic for Digital 
Systems Designers, Holt, Rinehart and Winston. 
3	  Geiger, Randall L., Allen, Phillip E. and Strader, Noel R. 1990. VLSI Design 
Techniques for Analog and Digital Circuits, McGraw-Hill. 
4	  Neamen, Donald A. 1992. Semiconductor Physics and Devices Basic Principles, 
IRWIN. 
5	  Mano, M.Morris, 1992. Digital Design, Englewood Cliffs, N.J.:Prentice Hall 
6	  Proakis, John G. and Manolakis, Dimitris G., 1996. Digital Signal Processing 
Principles, Algorithms and Applications, Englewood Cliffs, N.J.:Prentice Hall. 
7	  Lu, Shih_Lien and Ercegovac, M., Aug. 1990. "A novel CMOS implementation 
of double-edge-triggered flip-flops," IEEE J.Solid -State Circuit, vol.25, pp.1008­
1010. 
8	  Koren, Israel, 1993. Computer Arithmetic Algorithms, Englewood Cliffs, N.J.: 
Prentice Hall. 
9	  Chirlian, Paul M., 1987. Analysis and Design of Integrated Electronic Circuits, 
second edition, New York.: Harper & Row, Publishers. 
10	  Lu, Shih_Lien, ECE Department, OSU, Comment  on "A New Design of the
CMOS Full Adder", to be published. 
11	  Swartzlander, Jr. Earle E., 1992. Parallel Counters, IEEE Computer Arithmetic, 
vol. 1, pp 90-93. 48 
Appendices 49 
Appendix A  Hspice Simulation Results C § 
CD" 
VAC>  (PS) 
-
VBC> 
VCD>  (PS)  1119 
A  PGP 
W=3u 
co  L=2u 
(P 
M1  M2  M7  M11  M12  M13 
M20 
I 
PGP 
W=3u 
L=2u 
(PS) 
VI 
'GP 
W=3u 
L=2u 
(P) 
PCP 
W=3u 
L=2u 
(P) 
PCP 
(P 
W=3u . 
L=2u 
PGP 
W=3u 
L=2u 
(P  ) 
'GP 
W=3u 
L=2u 
(P  ) 
PGPc1L 
W=3u 
=2u 
(P 
M3  M8  M11 
M2I 
C.)  PGP 
(P 
W=3u 
L=2u 
PCP 
(P 
W=3u 
L=2u 
PGP 
(P 
W=3u 
L=2u 
PGP 
(P 
W=3u 
L=2u 
VCB  C>VSB 
cn 
cn  111  M9  M15 
M22 
NGP  wr3u 
O 
NGP  W=3u 
L=2u 
NGP  W=36 
L=2u  E>VC3 NGP1 
W=3u 
L=2u 
L=2u 
(N  N 
(N 
YGP 
M5 
W 3u  NGP 
MS 
w=3u 
M10 
NCP  wr3u 
Ml j
NCP  Wr3u 
M17 
NCP  W=3u 
M18 
NCP  W=3u 
M23 
NCP  W=3u 
L=2u 
(N 
L 2u  L=2u  I  L=2u  L=2u  L=2u  L=2u 
(NS)  (N )  (N  )  (NBODY-5­ (N )  (NM 
M21 
NGP  W 3u 
L=2u 
(N 
Full Adder with Minimized Size Transistors 
(NS) DO:AO:va X 
_ ___ _i_  - 1,1­
i 
-
Panel 11 
II  -, 
DO:AO:vcb  4 
I,  1  I  fa simulition with MM siz at T.-220C 
DO:AO:vsb 
11 
'11 
1 
li 
_ z 2  1 
0 
0  50n  100n  150n  200n 
Time (lin) (TIME) 
250n  300n  350n  40C 
Panel 12 
Wave  Sysitc 
D0:A0:vb 
DO:AO:vcb  ( 
DO:AO:vsb A 
0 
50n  100n  150n  200n 
Time (11n) (TIME) 
250n  300n  350n  40C 
Panel 13 
Wave  Symbol 
D0:A0:vc 
DO:AO:vcb 
DO:AO:vsb A 
0 
T 
50n  100n  150n  200n  250n  300n  350n  40C 0 
Time (lin) (TIME) 52 
Panel 1 
Wave  Symbol
DO:AO:va  /,  fa with min size at T=25 V=5.5v 
DO:AO:vsb 
DO:AO:ps  4 
0 
0 
50n  100n  150n  200n 
Time (lin) (TIME) 
250n  300n  350n  40C 
Panel 2 
Wave  Symbol 
DO:AO:vb r 
DO:AO:vcb 
4 
> 2 
0 _1 
50n  100n  150n  200n  250n  300n  350n 0 
Time (lin) (TIME) 
Panel 3 
WaveMI 
DO:AO:vc 
DO:AO:vsb 
4 
2 
1 
50n  100n  150n  200n  250n  300n  350n  40C 0 
Time (lin) (TIME) 
Figure A.3 Full Adder Simulation with Minimum Sized Transistors(25C, 5.5V) VAC>  (PS) 
VBC> 
VCC>  (PS) 
A 
M19 
W=3u 
PGPciL =2u 
(P 
PGP 
MI 
W=32u 
Lr2u 
(P  )  VI 
PCP 
112 
W-32u 
L-2u 
(P  ) 
PCP 
117 
W=3u 
L=2u 
(PS) 
M11 
W=3u 
L=2u 
(Kr 
M12 
W=3u 
L=2u 
(Ps) 
M13 
W=3u 
L=2u 
(P 
M20 
PGP 
W.3u 
L=20 
(P 
M3  M8  M11 
M21 
PGP 
W=32u 
L=2u 
PCP 
(P 
W=3u 
L=2u 
PcPc1W=3u 
L=2u 
(P 
PGPciW=3u 
L=2u 
(P 
M1 
NGP  w=isu 
L=2u 
(N 
NGP 
M9 
N 
W=3U 
L=2u 
VCB 
E> VC 
M15 
NGP  W=3u 
3  L=2u 
M22 
NGP  W=3u 
L=2u 
(N 
C>VSB 
23 
NGP  W=3u 
JGP 
M5 
W=16u 
L=2u 
NS) 
NGP 
MG 
W=1Gu 
L=2u 
L_ 
(NS) 
M10 
NGP.1 
(N 
W=3u 
L=2u 
M16  M17 
NGP 73) NGP  W=3u 
L 2u  L=2u 
(NBOD''IT  (NS) 
M18 
NGP 73u 
L 2u 
(NST 
L=2u 
(N 
21 
NCP  14=3u 
L=2u 
(N 
Full Adder with Optimum Size Transistors 
(NS) (PS) 
(PS) 
VOLKE> 
VINE> 
7  L, 
I I 
II  II 
Va  VOUT1 
-J; 
VB 
VOUT 
6 
VOLKBEE> 
(P  >  (NS) 
(NS) 
(PS) 
(P 
1=2u 
pcd 
2u, =2u 
.=  '  Y8u  1 NGP 
M20  M 
1.-­
19  M9 
'CP  wr1 s.J8u 
M17  M8 
NCP 
PCP 
W=6u 
L=2u 
VC  s) 
VOUTB 
mio 
W=lu  NCf 
L=2u 
(NS)  NS) 
y 
(NS) 
DFF with Improved Size Transistors 
(NS) Panel 1 
_Wave  symbol  1 
DO:AO:vin  >E-- 5  /'11  I  . \I \ 
00:AO:vout  ( t  /  /  I /
\ 
/ 
4.5  I 
\ 
I 
I 
I / 
I 
I /
ernes DFF 14156M T=-190C(lowest) for W/L=  u/2p(pmos) W/L- u(nm  s) [static) 
4  I  /  1 
I 
1 
I 
I  I 
I 
3.5 
I 
I 
I 
I 
I 
I 
11 
I 
I 
I 
I 
I 
I 
I 
3 
_ 
I 
I 
I  I 
I 
I 3  I 
I 
I  I 
I 
I 
I 
2.5 
a > 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I  I  I 
I 2 
I 
I 
I 
I 
I  I 
I 
I  I  I 
I 
I 
I 
I 
I 
I 
I  I 
I 
13 
I  I 
I  . 
I  I 
I 
I  i 
I  I 
I 
1 
I 
I 
I 
I 
1 
I 
I  I  I 
I  / 
I 500m 
I 
I  I 
I  I  I 
I 
/ 
/
4.  / 0  _ \ 
I 
I 
\ 
/ 
/ / 
I 
Il 
/I
_  r 
V'  -,/ 
I  r  -,  I  '--­
2n 4n  6n  8n  10n  12n  14n 0 
Time (lin) (TIME) Panel 1 
____VVave_Symbo 
Do:Ao:vin X 
DO:AO:vout  ( -4­
5­
\ 
/,- H \ 
I / 
I /  1 
I 
I 
\ -\I 
1 
I 
4.5  / 
1 
) 
I 
I  i 
I  NMOS  tic OFF  156M down to -150C r,  h NMOS W/L=8ur2p 
I  I  I 
I 
I  i 
1 
1 
1 
1 
I 
1  I 
1 
3.5 
1  I 
t  I 
\ 
1  I 
1 
1  I 
I 
1 
1 
I 
1 
1  I 
I 
1  I 
1 
1 
1  I 
a > 
2.5 
I 
I 
I 
1 
I 
I 
1 
1 
I 
I 
1 
I 
I  I  I 
I  I 
I  I 
2 
i 
I  1 
I  I 
I 
I 
I 
i 
i 
I 
I  ; 
1.5 
I 
1  i 
i 
I 
I 
I  I 
I  I 
i 
I 
1) 
I 
I 
I 
I  I  I 
I 
I 
I 
I 
I 
I 
I 
500m 
I 
I 
/ 
/ 
I 
1 
\ 
\ 
I 
1 
I
/ 
I 
I 
\ 
\ 
I 
I 
i 
/ 
I  i  ,-­ ,  I  -r 
f 
0 
2n  4n  6n  8n 
Time (lin) (TIME) 
10n  12n  14n Panel 1 
W a a  Arta  I  %  1\ 
DO:AO:vout  (_-)­ / 
I 
\ 
\ 
I 
i  \ 
4.5  I  NMOS static DFF  156M down to -1304v 
\ 
th NMOS W/L=4u/Tu  I 
I 
I 
1 
I 
1  I 
I 
4 
1 
I 
\ 
I 
I 
I  I 
1  I 
1 
3.5  I 
I  I 
I 
I 
I 
I  I  I 
I  I  I 
I 
I 
3 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
1 
I 
°'  2.5 
I 
III 
I 
I 
I  I 
I 
g 
2 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I  I 
It 
I 
I 
I  I 
I 
I  I  I 
1 
1.5 
I  I  I 
I 
I 
I  I  I 
I 
I  I 
I 
I  I 
1 
I 
I 
I 
1 
I 
II 
i 
I 
I 
I 
I 
I 
I 
I 
500m 
I 
I 
/ 
/ 
\ 
I
\ 
, 
I 
I 
I 
/ 
\ 
\ 
\ 
\ 
I 
i 
I 
/ 
0 _  _ ___ ___ , 
i  '  I  '  I­ I  1  1 
f 
0 
2n  4n  6n  8n 
Time (lin) (TIME) 
10n  12n  14n Panel 1 -1--,M  --.M.  04010  MD =M. Mi  iI Maar WOW SO. MS MO I =111  INI=p 1 6- ImI  M IM M ­ __Wava  SymboL 
DO:AO:out *  ...'S 
1 
,% 
1  , 
I 
....IA
I 
I I I 
DO:AO:in  ' 1  i  I!  o  I  I 1  i I" ;  , i ! 
I
I  I. I  it  i 1 ,.  i .,i  i  i i 
1 
1  . I  P.. 
I  p.,  .  ....,
I  i i I I I I I 
>  i 
, 
, 
I 2  I  / i ii  i  i
1 1 
I I ' I i 
i
I i 
f 1  1
1 
I i
I 
i
i 
li 
i 
I ! I  I 1  I  1 i i  I  i I
i
I 
I 
I 
I  is 
I
I 
iI %.  I  1 ° -n  %MAIN. .1. ..--aas raw AWAIT &rbe.... .= No rm. . wt. I. :U111116VEVIONIss no  Airy warom0.10411 YU Yea. ..- ow an.. or. mi :Will
I ' I  I  I I I I I I  I  I f I  I  I  I I I 
2n  4n  6n  8n 10n  12n  14n  16n  18n  20n  22n  24n 0  26n  28n  30n  32n  34n  36n  38n 
Time (In) (TIME) 
Panel 4
 
Wave _SymboL
 
DO:AO:a
 
DO:AO:In
 
0, a

2
 
0_ 
2n  4n  6n  8n  10n  12n  14n  16n  18n  20n  22n  24n  26n  28n  30n  32n  Mn 0  36n  36n 
Time (1In) (TIME) 
Panel 5 
_Wave  Symbo 
1 1 1 
1 
DO:AO:c1k1  )(,
 
DO:AO:c1k2  (
  4 
3
a
 
0 2
 
0 
I ' 
1  1  1 r I  ' T
2n  4n  6n  8n  10n  12n  14n  16n  18n  20n  22n  24n 0  26n  28n  30n  32n  34n  36n  38n 
Time (lin) (TIME) 
Static DFF with PMOS Transmission Gate (fmax=100M) _....Wave_SymboL 
DO:AO:out 
DO:AO:in  01 
0 
500p 
/ 
In 
I 
Panel 1 
'  \  I 
\  I  i, 
'  , \I  , ri  , 
I  ,  1,1 
I 
,  I 
/ 
\  I 
\,1  ,/  I 
`i 
l'  /  I  I 1'1'1'1'1'1'1'1'1 
1.5n  2n  2.5n  3n  3.5n  4n  4.5n  5n  5.5n 
Time (11n) (TIME) 
I 
6n  6.5n 
I 
/ 
I/ 
I 
I 
in  7.5n 
\ 
\ 
I 
1 
I  I 
I 
I 
1 
',1 
I 
8n  8.5n  9n 
I 
I/ 
, 
,11 
/  I 
/ 
9.5n 
Wave 
DO:AO:a 
DO:AO:In 
SymboL 
(  4 
Panel 4 
0 
500p  in  1.5n  2n  2.5n 
1 
3n 
r---1 
3.5n  4n 
I  I  f "---r 
4.5n  5n  5.5n 
Time (lin) (TIME) 
6n 
"r""  I 
6.5n 
I 
7n 
I  ' 
7.50  8n  8.5n 
I 
9n 
-1- ' 
9.5n 
Wave - Symbo 
DO:AO:c1k1 
DO:AO:c1k2 
I2 
0  500p  in  1.5n  2n  2.5n  3n  3.5n  4n  4.5n  5n  I'llffl
5.5n  6n  6.5n  7n  7.5n  an  8.5n  9n 
T--'1 
9.5n 
Time (lin) (TIME) 
Static DFF with CMOS Transimission Gate (fmax=357M) 60 
Appendix B Wallace Tree Structures 61 
Figure B.1 MAC 01 
Figure B.2 MAC 02 
Figure B.3 MAC 03 62 
Figure B.4 MAC 04 
40 40 0 41 40 41 40 40 40 40 40 41 40 41 41 40 41 41
 
41 40 40 40 40 40 41  41 41 41  40 41 40 40  41 40
 
40 41 41 41 40 41 40 40 4, 40 40  41 40 41 41 40 41
 
Figure B.5 MAC 11 
41  41 40 41  40 40 41 40 40 40 41 41  41 40 41 40
 
41 40  41 41 41 40 41 41 40 40 41 41 40 41 41 40 40 40 41 40
 
40 40 40 40 40 40  40 40 40 40 41 40 40 41  41 40
 
41 40 40 41 41  41 40 40 41 40 41  41 41 41  40 41 40  41 40 41 40
 
41 40  41 40 40 41 41 41 41  41 41 41  41  41 40 40 40 40
 
Figure B.6 MAC 12 63 
Figure B.7 MAC 21 
Figure B.8 MAC 22
 
Figure B.9 MACRO 31
 64 
Figure B.10 MACRO 41 
Figure B.11 MAC 3
 
Figure B.12 MAC 4
 65 
0 40 40 40 40 41 41  41 40 41 41  41 40 41 40 40 41  41 41 40
 
41 40 41 40 41 41  41 40 41 40 40 0 41 41 40 41  41 0 41 41 41
 
41 41 41 41 0 40 0 41 41 41 41 41 0 41 41 41 41 41 41 41 0 41
 
41 41 0 41 41 41 0 41 41 41 41 0  41 41 0  40 0 41 41 0
 
Figure B.13 MAC 5 
41 41 41 0 41 40 41 41 41 41 0 41 41 41 41 41 41 41 41 40
 
41 41 41 41 41 41 41 41 41 41 41 41 41 40 41 41 40 41 41 41 41
 
41 41 41 41 41 41 0 41 41  41 41 41 0 41 41 41 0 41 41 41
 
41 41 41 41 41 0 40 41 41 0 41 41 41 41 41 41 41 41 41 41 41
 
Figure B.14 MAC 6 
Figure B.15 MAC 7 66 
Figure B.16 MAC 8 
41 41 41 41  41 41 41 41 40  40 41 41  41 41  41 40 40 40 0 40
 
41 41 41 0 41 41 0  40 40 40 41 40 41 41 41 41 40 0 40 41
 
41 40 41 40 41 41 40  40 lb 40 40 40 40 40 40 40 0 41  41 41
 
Figure B.17 W3N20L20 
lb 40 0 0 41 40 41 40 40 41 40 41 40 40 41 41 41 41 41  40
 
40 41 40 0 0 41 41  41 41 41  41 41 41 0  41 41  41 41 41 41
 
41 41 40 41 41 41 41 41 41 41 41 41 41 41 0 41 0 lb 41 41
 
Figure B.18 W3N2OB 67 
W3N21L20 
Figure B.19 W3N21L20 
W3N26L20 
Figure B.20 W3N26L20 68 
0 0 0 0 0 0 0 0 GO 0 0 0 0 0 Q) 40 0 41 IS 41
 
0 0 0 0 0 0) 0 CO  Q) 0 0 0 0 0 40 0 C5 41
 
0 0 0 0 Q) 0 0 0 QD  GO 0 0 0 GO 40 0 40 0 40
 
0 CO 0 0 0 GO 0 0 0 GO 0 0 0 0 0 41 C9 IS) 40
 
0 GO 0 0 GO 0 0 GI GO 0 OD 0 0 0 0 40 6?
  II
 
46 0 0 0 0 0 0 CO 0 0 0 0 0 Q) 0 0 0 41 C5 40
 
0 0 0 0 Q)  GO 0 0 0 GO 0 0 0 0  0  0
 
0 0 0 0 a GO 0 Q) 0 CO 0 0 0 CO 0 41 C9 40 6) II
 
0 0 0 0 GO 0 Q) CO 0 0 Q) 0 0 0 0 40 0 40  40
 
46 0 0 0 0 0 0 GO 0 OD 0 0 0 0 0 4 ®0 6 41
 
0 0 GO GO Q) 0 0 0 0 0 0 0 Q) 0 0 10  40 0 40
 
0 CO GO 0 0 0 0 Q)  GO 0 0 0 CO 0 41 C9 II 6)
 
0 0 0 0 0 0 0 0 GO 0 Q)  GO 0 0 40 0 40  40
 
0 0 0 CO 0 0 0 0 0 Q) CO 0 0 0 0 40 0 41 CI 40
 
Q) 0 0 0 0 CO 0 0 0 0 0 0 Q) 0 0 IS 6 41 0
 
GO 0 OD 0 GO 0 0 GO 0 0 OD CO  ip  ID
 
GO 0 0) 0 GO 0 0 0 0 0 0 0 GO 0 0 CO C5 0 6) Q
 
0 0 0 0 0 0 CO 0 Q) 0 CO 0 Q) 0 CO 0 40 40  IP
 
el at CO 0 0 a 0 0 0 0 GO 0 Q) AG 0 0 0 C9 OO
 
W15N2OB 
15 20-bit Wallace Tree 
Figure B.21 W15N2OB 6
9
 
4
0


 
0


 
4
1
4
1
4
1
0
 
O
0


 
G
O


 
4
1
4
1
4
0
0
 
G
O
 
G
O
D
0
4
1
 
G
O
0


 
4
0
4
1
 
4
0


 
0
 
0
 
o
0
0
 
4
1
 
0
G
O
O
D
 
4
1


 
4
1
4
1
0
 
0
 
0
G
O
 
4
1
 
O
D
0
4
0


 
4
1
4
0
4
1
0
 
0


 
a
0
0
4
0
 
4
1
 
0
0


O
D


 
4
1
4
1
4
0
0
 
0
 
e
O
D
0
 
4
1
4
1
 
G
O
G
O
0


 
4
1
4
1
4
1
0
 
0
 
o
G
O
O
D
4
1
4
1
4
1
 
0
0
0


 
4
1
4
1
4
0
 
O
D


0
 
6
o
G
O
4
1
4
1
4
1
 
O
D
0
G
6


 
4
0
 
O
D


0
 
0
6
0
 
0
0


4
1
4
1
4
1
 
G
O


 
O


4
1
4
1
4
0
O
D
 
G
O
 
O
D
O
D
4
1
4
1
4
1
 
0
0
O
D


 
4
1
 
4
0


 
0
 
0
 
0
G
O
0
4
1
4
1
4
1
 
0
1
0
0


 
4
0
4
0
4
1
 
O
D


 
G
O
4
1
4
1
4
1
 
G
O


 
4
1
I
P
4
1
0
 
G
O
 
O
O
D
0
4
1
4
1
4
1
 
0
0
0


 
4
1
 
O
D
 
G
O
 
G
0
O
D
4
0
4
1
4
1
 
0
G
O
0


 
4
0
4
0
 
4
0


 
0
 
O
0
 
0
0


 
W
9
N
2
O
B
0
 
0
0
G
O
I
I
4
0
4
1
 
O
D
0
G
O


 
4
1
4
1
4
1
o
 
0
 
O
O
D
0
4
0
4
1
4
1
 
G
O
O
D
0


 
4
1
4
0
4
1
6
 
Q
)
 
O
0


 
O
D
4
1
4
1
4
1
 
1
0
G
O
O
D


 
4
1
4
0
4
0
Q
,
 
0
 
0
G
O
0
4
1
 
4
1
 
0
0
0


 
D


4
1
4
1
4
0
G
O
 
O
D
 
0
 
4
0
4
1
4
1
 
0
0
1
6


 
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


0
Q
)
 
0
0
0


O
D
 
4
0
4
1
4
1
 
G
O


 
W
9
N
2
O
B
O
D
4
1
4
1
4
1
 
0
G
O
O
D


 
4
0
4
0
 
0
G
O


 
0
 
B
.
2
2
 
4
1
 
G
O
 
0
4
3
N
2
0
B
 
M
A
C
3
 
M
A
C
4
 
I
D


M
A
C
S
 
M
A
C
E
 
M
A
C
7
 
G
O
 
M
A
C
8
 
F
i
g
u
r
e
 70 
W3N20L20 
W96600006960000SS600
 
W3N2OB  0000W9660006)®(960000®
 
V866000019860000VS600
 
W3N2OB 
00000000000000000000 
W3N2OB  00000000000000000000
00000000000000000000 
W3N21L20 
W3N26L20®  (2°  °  (g) " ° °  ® ® " 0 0000000*(0(00000 0000000000
00000 *00E00®00 ©00000
 
MACR001 
66000056600008660000
 
MACR002  0000000000000600000 
9WS,660000IS)S6000VS660
 
900*****041***0010410**
 
MACR003  .041100.041.0400.010.04116. 
100.000.061.1.10.1.0100000
 
e0000eeeleeooeeeoe000ee
 
oeoeeooe)ee0eQ000eeee
 
oe)oe(D00000eeoeQooeeee

MACR004  LEVEL III ( to be continued) 
Figure B.23 Level III 71 
(Continued for LEVEL BEE)
 
KEEP
 
KEEP
 
000000600020000000006
meAcR" ooaeooe  ooeo ® a co 000e® 
000000000000000000000 
MoACR°12 
004110411100000004110000 
MACRO21 
MACR022 0  02000600000000000000
0000000000000002000600000

0000000060000000000000 
00090eGaiDeGOGOGG00000000GO0 
MACR031 
MACRO41 
LEVEL III 
Figure B.23 Level III (continued) 72 
Appendix C Schematics of the Wallace Trees 73 
latc1-11024 
E 
xclm-1_024 
T:-- T:-- I 
121111211111111111111  112112111112111111111111111111111111 
m256  m256 
]
 
`,711-77'1'71-1,117  t 117r m19 mT1  S 
wic 256 1eNre12  256 le-veil 7.471 
121131,1122mb1J1..140 
256.20bits->4(group).17.201.4.to
  256.20bito->4(group).17.20bitm
 
11..."  '7..1
  III. -"nil
 
Trim= 
I:1 
W/s-nzoi,  11111  mill 
111 
iw  11111ilimmi
krtveami 
.30.  Pamtl 
1.1.1242:16111:1=11
 
FIR 
Figure C.1 FIR Top Level Schematic IN 
122 
sh1.111 
.11 
A.Va  vvr tomos  100. ...az 1..1 
01322 
IN 
sT 
shlet.4 
00r1 /ow 1/4412 
K2 
w+10U2 
N  002439 
odr 
IN  0142203 
OUT( 27611241 
N 002717 
i01201146.7011 
0122430 IN 
.1.41 
00117067471 
122 c2 RR 
42,161  ours 11 xel 
NEW 
h .....  447  ...  01.12241 
ST 
1-44 roe  .1,40744,41 
Mt 
OUT KO .31 
stir 
IN  01212101 
0.-rr1  I 4 1 
002164 s 1271 
221  002255 
1001111341.141) 
4h ..... 
547 
0221121.2553 
SHIFT1024 75 
SET
 
IN
 
CLX
 
plummuumPi impinimpulerwinvoi 
riarformarpoprigr. 
OUT[0.63:
  MO
rigrisrprimgromori
 
mmailL14.110111."PliNiaii
 
rgi gra  Era rdir. 
SHIFT64 
Figure C.3 Shift 64 RIKKT  OLOCt 0 t 3 I  00...C4461271  OUT[  007(102.2..)  COT(234.71111.)  0  (220.203)  0  (04.4471  O.  "[....311) 
IN( 
CLK 
HtS12.K,S1 
.40210231  0  11.0 3 . 0 30 )  O  1.281.53 
741..311  0  70417471  00.  440.7021  074.4301  0  I62.3731 
LATCH1024 77 
2 18  TM 
fliffiel 
CLK 
OTr[0:631 
S 
INCO: s 
LATCH64
 
Figure C.5 Latch 64 78 
A.00:631 I  11E1[0.63)
  A[64,12731  113(64.1271
 
I.  71 W.. 21
  re  SI .19  21
 
acer64
  acor64
 
f
 
OUT  6.1
 
OUT[0.633  OUT[64.1273
 
11
 
OUTI192:2553
 I OUT[128.191]

.412.1
 
xor641
 
A[128.1913  D[128:1911  A[192.255]  B[192.255]
 
A(256.319]  n[256s319]
 
OUT[25613193
 
OUT(384.4471
 
JIOUT(44835113
 
64591
 
aco=64
 
A(384:4473  B[3842447]
 
A(448,511]  13[448:511]
 
X0R1024 
Figure C.6 XOR 512 79 
8[0.63]
 
DO  81 B2  83  84
 
Fd CO  31
 
3.
 OUTO  our  I OUT2  1 OUT3  OUT4 
40.1.%)
At  I, J.
 
4
 
Yours  007.9  OUT10  I OUT11  OUT12 
.10  .14  .10 
Al
 
OUT16  1OUT17  1OUT18  1OUT19  10UT20
 
JYC  FILL 
4 4
 
OUT24  YOUT25- YOUT2elfUT27 YUT28
 
tfSL  . 0  . 0 
4
 
OUT32  OUT33  UT 34  OUT35  OUT36
 
41l 
OUT40  OUT41  IOUT42  I OUT43  IOUT44
 
WOWENMEMEWOMMOIMME 
OUT49  OUT50  OUT51  OUW52
 
DO 
4 
.....--/ 
OUT56  OUT57  1 OUTS 8  OUTS 9  our6  cl, 
Figure C.7 XOR 64 
D5
 
OUTS
 
IOUT13
 
100T21
 
OUT29
 
DUT37
 
.i 
1OUT45
 
10UT53
 
4 
ss....., 
Y our 6 i 
6
 
OUT6
 
ILL
 
lOUT14
 
..
 
1OUT22
 
OUT30
 
OUT38
 
to, 
lOUT46
 
1OUT54
 
4 
,, , 
OUT62
 
87
 
OUT7
 
OUTEO:63
 
OUT 15
 
23
 
4,2'3 
1OUT=3
 
39
 
bUT 9
 
IOU847
 
IOU TS5
 
63
 
X0R64
 80 
3 
1101  00.3.31
00010193  C0.10.1. 
COO OOO  C OOOOO CO.1I 
OO  OO C01020.1.
 
C OOOOO .)  11
 
OO .1.213 
391 449.9  m32
 111=9494. 
3 3 3 1 1  3 
11.11111116 Bill WW1 111111 
OOO  0.7gliii!iiit" 
m3 2 
1 1  OOOOOO 1  3 OOOOOOOO 1 1 
ij I 111111)1111111)141111)111111 
. . 
i441  1?1. 
I I I  III! WO 011111 
99.9* a
O! m3 2 
_J0111110 111111111111111111111 
11101,111111Mill  Mil iiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 
120013073  m3 2 
-mmHg. itiM11-1111), -atamilitl,!,,  --1,47.4%wann 
--zuman.741thian.,"4.041in.,...2441:­
714111  011.11.0.1. 
.. .191  4741  OO 1  .00.1 
OOOOO 9111  COS11.13
... I  C3 
1203  CO.( OOOOOOOOOOO
C0.101212  .. 1.1  C.03.11 
C OOOOO 3
 
CO.1.1111
 
!Ill  OOOOOOOOOOO OOOOO 03 21=iiiIiIiittes,slistssistilitstt
11.10.471 
rn 
S t OOOOO 1  m3 2 
. OOOOOO 
f f!ff!ffflif  f 77 
1)1  I Willi 1140111i 
7:441.1; -et O 
OO  OOO 
OO 411 .ast,  9999.0.111  me  ° 
...0.91:91:117":  <1101iieante.:1 .  .. 
.;71119111111111111WIffilli!! iiIIIIiilliiiliiiiiiiiiilliiiiii 
m3 2 111911.99494 
"24111111;72414111111,1'4Z.111.114:11.14Z411 
IN 
11.4,4111./ 
11.74 .....  m3 2 
-2-44:111 
`414.149111::t., 
0101I11111101111111111HIIIII iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 
320 . 
.....  a- m32 
a__iL..IIIILLIII.IIIII 
°":MUnie. 1  owe" "fel 1.  coma &IV 1  04.4141

.°7.41:4pttli  "n5141:i1.1.,  "'"4.4.1t11%.,
 
Figure C.8 M256 
I 81 
C00[0:19]
 
=INEINIEFLLUI."
 
CO1[0.193
 
INEWaJUJ.N
 
CO2(0:193
 
IININEMEIELLLLP
 
CO3[0 :193
 
LLB'
 
CO4[0.19]
 
ENIEMELL2UIN
 
C05[0 :19]
 
r:
 
CO6[0.193
 
NEN=NFUlu.N
 
C07(0.193
 
=/4232J-L''
 
C16[0.19)
 
IMM=UUJ"
 
C17[0:193
 
NMEWLUU"
 
C18[0:19]
 
MENWaLuJ"
 
C19[0:19]
 
immowaLuu. 
C20[0:193
 
INFI'LLLJN
 
C21[0:193
 
11==LLILJN
 
C22[0.19]
 
C23[0:19]
 
13(0:7]
 
1.1.71, 
OUT00[0.191
 
OUT01[0.191 
00 
'42.124MMEMO 
OUT02[0:191
 
wuU"LUMNEM
 
(-453 
OUT03[0:193
 
°°:'x1:4a]21MMENO
 
007'04[0:193
 
'""N.UUMENNEE
 
OUT05[0:191
 
""."0-WLEMEMMO
 
oxywo4[0.191
 
"'"N.LLLMEMMME
 
OUT07[0:191
 
°""LUUJEMEMN
 
1111111 
007.16[0:193
 
""'N'ULLIEIN=EM
 
OUT17[0:193
 
°°"411.1.14MIIMMINI
 
OUT18[0:19]
 
w''"..121MMENNE
 
OUT19[0:191
 
OUT20[0.193
 
w"Lu=11111MOM
 
OUT21[0:193
 
MJ.LMEIMMOMM
 
007.22(0:191
 
007'23[0:19]
 
°""LULLE111=
 
our08t0.191
 
OUT09[0:193
 
OUT10[0:19] 
MEEMMPMJj 
191 
OUT11[0:19]
 
.1 MEMEMPMM
 
OUT12[0:191
 
2.2.2 MEMEMP2,4.,
 
ouv13(0.19)
 
MEMOWIEDIg
 
OUT14[0:193
 I MMEEMPM"
 
pow15(03.191
 
IIIMMEMPmu,
 
C24[0:193
 
MENNEWMj
 
C25[0:193
 
MMONEWELUI
 
C26[01193
 
I1 N NEMEMMW
 
C27[0:19]
 
MENOMMLUS
 
C28[0:193
 
E MMENPU
 
C29[0:193
 
MEMMIV0.10.
 
C30[0:193
 
N EMENPrni
 
C31[0:19]
 nom.F=1
 
I/ 
!DO:14153 
CO8[0:193
 
"NULUMEMME
 
CO9[0:19]
 
.0 10
  9
 
'N.P.U.UENNEE
 
C11[0:19]
 
UENME
 
C12[0:191
 
"ILLLUMNEME
 
C13[0.19]
 
MLIUNNNME
 
C14[0,193
 
'N.L.JUNMENE
 
C15[0:19]
 
"L'UUNNEME
 
OUT24[0:193
 
"N.U.LUMMMEM
 
OUT25[0:19)
 
"LU.LUNEMMM
 
OUT26[0:19)
 
'NJ...LUMEN=
 
00.1.27[0.193
 
OUT28[0:19]
 
'NUJIMMENEE
 
OUT2910.193
 
'N.U.JUMMENE
 
OUT30[0:191
 
"LUIUMMENE
 
OUT31[0:19]
 
"N.P..LUNEEMM
 
M32
 
Figure C.9 M32 82 
Priki mIll1 
GS t111  1:121013 
Cotrealf%1  Coif 1.1.1  41.7".) 
ll MOW
 
Iiim, Ws lift
 
liiie- In lift lift
 win IllIelue 1143I  hoe 
lima lift 16141 
Rio lige lige 
Rog_ l igkel lige 
Pao lift li:43 
liio W W1 141441
 MA PAM  141412
 
1110 &All li;43
 
l'iito 1 441 li141
 
Igo 11041 lift
 
l'ilm 1 041 li041
 
Rio Mt! 11041
 
riiKA, l ift lift
 
11410 141411 11:43
We 161411 14141 Ma l ike  11141 moo  arm  on.
-Rs 
010.113 
OVIVI0.19/  ITTe.11 
M8 
Figure C.10  M8 83 
Figure C.11 FA 24 or Counter (3,2) 8
4
 
Q
.
O
 
H
 
0
1
 
L
L
e
 
E
-
1
 
E
l
 
E
l
 
0
 
0
0
 
0
E
,
 
1
6
I
8


 
m


 
L
P


 
E
l
 
0
 
0
 
U
 
C
d
 
O
 
1
H


Z
Z


H
H


 
E
a


 
p
p


 
0
 
1
.
1
-
1
 
E
l
 
o
 
0
 
0
 
C
d
 
1
1
-
1
 
U
 
)
1
1
t
1
1
 
r
'
;
1
L
H
H
 
H
H
H
 
F
i
g
u
r
e
 
C
.
1
2
 
C
o
u
n
t
e
r
 
(
7
,
3
)
 151n(unweigHted)-->4 weighted out
 
ZN 
fa a 
COWS 
OUTS 
ZN3 
a
 
ZNS 
Z.N4 
IN7  fa 
UT 
fa
 
IN 
a
 
fa
 
a
 a
 
1w1.... pd 
W151'04
 2052.01101  11401(0.101
 
1
 
ialli111112114111112":1:71411111:W" 0441:Ptii111411TITI  INATINNORIPT)

10(0111!" 0.401710,04144:::10.1ni  0"3"01t.)04:10.11)  INniel0()1NSSISe10)  X.441(0.10) iN70[0.111)  IN77.0110].XN0S(0.191

IrallgliTIM000(0.100 .4014(0.10j1.4024(0.10(  glii741!,44.10Sf110)  xIMILII.%1414.1:01  .4:81.41:01.a.404TIVI  g1;11,101:7)=4::11.?1101

2000010.1000000 0.1  1M015.0110 20023 0.1
 
."":01110,10)
 
003 2g;111.101
 
0X1577(0.101  1017010.1
 
101  XI:T110141(0?1:11 0.7474e11"44;leirli
 10154(0.10) 0014210.101  21414.(00.0) 12mafdlytam,'
 
0015310.10) 0881301 .103  31414710?1:1" '""Tele'
 
0110.381  1](0153
 
10.10)
 
.4712401:*1.... el:"  el:" .Alleele141111hi'llgn:leli'liTtl'Shi")
 
0140( 81
 141(0.101
 
114'4':11114.)., out is
 
101j3mbij2.2bij1.4=bij0.8
 114010,103
 
iN1t0.1112400
 
002(0,2,1
 
O
  a  x sax  1 
20  331  3142  2 2  a 317 
(1
 
ro  ro 
it4  044 
OVT1  UTO1  OUT14  0U1.02  OU1.1S  001.04 01,1  0111.1  07  01,11  UTOO 01/111
 
011701  04.1.01  ColfrOl
 
0,70[0.1111 
OfX1sai  Ogrpol  00.1.01  SKIVO1 can, is . 20) 
120 
W3N2OB : : 
ION 
BEE 
NNE 
Iii 
Mill MOM 
iii  =2: !ill 
Mi 
ERE 
MEI  =0 
11111111111111  iiiiiiiiiiilli 
1111010" 
1111111111i
ZS= III 
01111111111If 
ZEES III 
IIIIIIIIIIIII! 
!Pi 
11111111111111 =1 =2 III  ill 
;;  iig  ;11  !!!  !!! 
21:21 
iliiiiiililiii  N11111111111 
111111111111" 
FF. 
1:
 1M On  lel  zero, /110x1  7M04[2  1)  oo  rolr.$)  203 rr MO. 1007  172) (o")  7701[7.311  TAU  83.22)
 
70 7470 0.2
  MAC 0 3  W3M2 01,2 0 
71[0.10
 
71[O  101  70[1.713
 
70(0.107
 
75 [77 901  70[0,173
 
mew...
 
MAC 5  MAC 6 
55)  771
 
77[0
 
MAC 7 
711[9
 
MAC 8 
00701(0,70)  0[7702(3.201
 
W9N2OB 2,410.201
i1 
W3 1V2 0 
11420  211  12130 
111111 11 211 1x1 
121012  ail 
mem 
n121.11221 
121113 s221 
us 
2/4111 11231 
IN2/6.22, 
7017]  .41 
2812  8,24) 
ur31a21120 
22.117 1.41 
Stallig241 
W3N26L20 
mac 0 1  MAC 0 2  MAC 0 3  MAC 04 
I 
MAC 11  MAC 1 2 
MAC 2 1.  MAC 2 2 
11 
MAC 3 1 
MAC 4 1 
01/T0110120:1  OUT020.201 
LEVEL III 117.301, .331 
PPrea le .34 I
 
03,303 I  .241
 
--i>--i>er--­
MAC03 1000(2.211 
X001(44231 
11002(48241 
IN* 
4 
14 
25 
I 
4 
OS 
ISSol 
I1615 
0  1, 
XMOOS 
MIDIS 
I
4.  4 
24 
INOD7 
IM O67 
2240]7 
4. U 
I110312 
INCD 
4 
IN 
4 
2240151 
INDDIC 
TN0110 
4 
I 
U 
210 
IN  11 
2240111 
4 
IN 
U 
11 
IMO 
4 
12 
12 
IN 
4 
1 
22201 
1021013 
it4  0  0 
I44 
0070112.241 
ciaT014 
2S 
8 
DVS 
OIS  021.10111 
OUT027 
009017 
01,700  111 
019010 
0120011  10 
0110 
11 
222T0111 
12 
0112 
13  [DV l 
30T021,241 
00901/2 
022T02 15 
0271.0  0,21011/4 
7090217 
02111012.7 
12922021 
00T00.2.  0090 
4T0.. 0 
90 
00T0221 
01,90121  0110012  222 00T0103 
a 
n.  0, INO	  3  n r r X110	  0 0
01.1	  XN021,  INOO  1211121.  DV  0]0  ZM  S2acN2 IND  0N0211  IN01 IN  14	  IN101 IN IA	  XN  17 10	  x00010  IN  1	  21 IN  20  222 
IN003  IN  24 
OUT012 
OUT12124 
MAC1 1 '0 1
 VD 0 2000 t0111 
2001(13201 
200211.a11 
12  2H0  l  xN  13  rea014  20017 
X 
X  1 
X  Or 
2Ne 10  200110 
2  21  2  22  av021  22  2x0]4  2n m7  01402  214022  040210 
4  4  0  4  4  4  0  4  0  4  I  4 
.41  144 
01x20210.211 
12 
3 
027011 
0101224  2312 
3.  0120012 
307404 
022014 
OUT  7 
0  017 
02 
01  o  0113 
22 0210 
o 
0210110 
22702 11 
01202 (232 
0 
211 
010001" 
22.4 
210011 
027  12 
222011 
0110214 
010011 
2/7 
117 
01 
ouro x 
1 
0207120 
2272221 
0  0  0  4  2  1 1  V  0  0  0  0  0 
043016 
2120111 
IHO0 12 
003112 
002013 
2x0111 
20001 4 
Sa  /. 
000012 
2342212 
002014 
00012 
X02017 
X14  17 
23400 
034  IS 
200 
20 
040211  2!0212  00212  214  20321  0/4  1.  SN 217  iN 21  212 
20 
221010  040211,4  Ill>7 
MACS (0
 
EINEM=
 
4 -.1e. carry
 
1wItA 0.000-150101.43 01
 
0 IMUN
  C2  C3
 
COT 
2 0
  OUTE0t321
 
IS 
(v115. oftrs.-1005.7.60.41,
 
4-011. carry­
.0
 Al  A2 Al 0 1 2 3
 
1VS  11.11  1,07  ZEE 
bIto carry
  4-010 carry -mleee
 
m....-100kAhem01
  (wIon carry- 00000 hOma,
 
rums StA10411 SUNS C1 Cl2  C.3  Cl
  ONO SIMI  2 .1.43 C C2  l  Cl
 
003CIA  003=1
 
0131.1
 
trICT 
t-Ix  (rich 0.51-1.-100kAh..41
 
bit. ozrle.
  4-loto 0aziy-molam.
 
AO  Al  A2  AS SO 1 2 3
  AO  Al  A2  AS SO 1 2 3
 
ALA A
 
0 +1 +2 A123
 
EI 
O 
..3  .0 A3 Ka Al A
 
4 -bite earmy
  4-bic carry-eole01
 
(rich 0mrry-100.....1  CZ
 1.1!fi oftrz.-lookAHead)
 
piWNO 11110,1130/42 SONS CI Cl  C3  Cl
  Cl  C3  C2 CI 1311M1 fUM2M0M1 140
 
00m 24
 0070  °Ors° 
01,2  0170.
 OUT'  03,521  Ot31331  g 
FA3 1 1
0
8
 
r
o


 
w
 
0
 
0


0
1
1
1
1
1
1
1
1
1
1
0
1
1
1
0
1
1
1
0
 
0
 
4
4
4
4
4
4
4
4
.
.
.
.
.
.
.
.
"


1
1
=
=
1
M
M
M
I
M
M
i
k
 
u
m
.
 
t
o
 
C
.
 
M
.
t
o
 
0
5
.
1
.
 
e
e
e
M
e
 
m
m
e
T
-
R
a
n
e
e
 
m
a
T
m
.
 
W
m
.
.
.
 
M
e
e
t
 
-
A
m
a
y
m
 
e
i
V
.
­
M
M
.


 
O
V
I
V
 
C
V
 
C
V
 
O
U
T
.
 
C
.
C
.
C
.
 
T
O
O
.
 
C
V
 
C
V
 
T
V
 
O
V
 
C
M
 
F
i
g
u
r
e
 
C
.
3
6
 
4
-
b
i
t
 
C
a
r
r
y
-
S
e
l
e
c
t
 
A
d
d
e
r
 1
0
9
 
F
i
g
u
r
e
 
C
.
3
7
 
4
-
b
i
t
 
C
a
r
r
y
-
L
o
o
k
-
A
h
e
a
d
 
A
d
d
e
r
 