Design of complex digital blocks using folded source-coupled logic for mixed-mode applications by Kiaei, Sayfe
AN ABSTRACT OF THE THESIS OF
Sailesh R. Maskai for the degree of Master of Science in Electrical and Computer
Engineering presented on May 7, 1991.
Title: Design of Complex Digital Blocks using Folded Source-Coupled Logic for Mixed-
Mode Applications.
A
Redacted for Privacy
Abstract approved:
Sayfe Kiaei
A series of complex digital blocks have been designed and fabricated using the newly
developed current-mode differential CMOS logic family viz. the Folded Source-Coupled
Logic ( FSCL ). The main feature of this logic family is the low current spikes generated
during the switching transitions ( at least 2 orders of magnitude smaller than the
conventional static CMOS gates ). The design of a decimation filter using novel Multi-Rate
systolic architecture and it's implementation in Folded Source-Coupled Logic is also
considered. The decimation filter thus designed can be used in mixed-mode applications
like Sigma-Delta A/D converter to improve it's performance characteristics like dynamic
range, resolution and phase linearity at higher sampling rates.Design of Complex Digital Blocks using Folded Source-Coupled Logic for Mixed-Mode
Applications.
by
Sailesh R. Maskai
A THESIS
submitted to
Oregon State University
in partial fulfillment of
the requirements for the
degree of
Master of Science
Completed May 7, 1991
Commencement June 1992APPROVED:
4
Redacted for Privacy
Assistant of Electrical and Computer Engineering, in charge of major
Redacted for Privacy
Head of Department of Electrical and Computer Engineering
Redacted for Privacy
Dean of Graduate ScItio
Date thesis is presented May 7. 1991
Typed by Sailesh R. Maskai for Sailesh R. MaskaiACKNOWLEDGEMENTS
I would like to thank my major professor, Dr. Sayfe Kiaei, for allowing me to work on this
interesting project. Also, his patience with me is appreciated. My sincere gratitude also
goes to Dr. David Allstot who has been the source of inspiration throughout my stay at
Oregon State University.
A grant from National Science Foundation Center for the Design of Analog-Digital
Integrated Circuits (CDADIC) at Washington State University, Pullman, WA, and the
University of Washington, Seattle, WA has helped me financially during my graduate
studies here.
San-Hwa Chee and Gayani Gamage have provided valuable discussions and help and
deserve much credit for this work.
Most of all, I would like to thank my friends Cameron Fajer, Rajesh Zele, Krishna Nallani,
Vivek Subramanian, Satish Srinivas and La lit Merani who have helped me during my
rough ( party ) times in Corvallis.
Finally, appreciation also goes to my minor professor Dr. James Herzog, Prof. Rathja and
Prof. Bushnell for taking time out from their hectic schedules to preside over my defense.TABLE OF CONTENTS
1.
2.
INTRODUCTION.
1.1
1.2
COMPLEX
FOLDED SOURCE-COUPLED LOGIC.
1
1
3
4
GOALS AND OUTLINE.
FSCL GATES.
2.1SERIES-GATING TECHNIQUE. 4
2.2MINIMIZATION PROCEDURE. 6
2.2.1 COMPLEMENTARY PAIR ELIMINATION ( CPE ). 7
2.2.2 NORMAL PAIR ELIMINATION ( NPE ). 7
2.3MERITS AND DEMERITS OF SERIES-GATING TECHNIQUE. 8
2.4SIMULATION RESULTS. 10
2.5EXPERIMENTAL RESULTS. 14
3.DECIMATION FILTER USING MULTI-RATE ARRAY. 16
3.1CONVENTIONAL APPROACH. 17
3.1.1 DEPENDENCY VECTORS. 18
3.1.2 PROJECTION VECTOR ( Cr ). 18
3.1.3 SCHEDULING FUNCTION ( i ). 20
3.2DEMERITS OF THE CONVENTIONAL APPROACH. 21
3.3MULTI-RATE ARRAY APPROACH. 21
3.3.1 DIRECTIONALLY UNIFORM RECURRENCE EQUATIONS.21
3.3.2 MULTI-RATE ARRAY FOR THE DECIMATION FILTER. 22
3.3.3 DEPENDENCY VECTORS. 23
3.3.4 UNIFORM HYPERPLANES ( C ). 23
3.3.5 PROJECTION VECTOR ( 71 ). 24
3.3.6 MULTI-RATE CLOCK RATIO ( K). 243.3.7 TIMING FUNCTION. 24
4.IMPLEMENTATION OF THE PROCESSING ELEMENT USING FSCL. 26
4.1DESIGN OF A MULTIPLIER-ADDER. 27
4.2IMPLEMENTATION OF THE MULTIPLIER -ADDER IN FSCL. 32
4.3MULTIPLIER-ADDER AS THE PE FOR DECIMATION FILTER. 37
4.4THE COMPLETE DECIMATION FILTER. 39
5.CONCLUSIONS. 41
BIBLIOGRAPHY. 42LIST OF FIGURES
Figure u
1VDD current spikes generated in minimum size static CMOS inverter. 2
2Basic Folded Source-Coupled Logic ( FSCL ) inverter. 2
3A generalized FSCL structure for a complex logic function. 5
4A two variable logic tree ( --- path terminations for NAND/AND function ). 5
5Examples of redundancy in a logic tree. 6
6Example of complementary pair elimination ( CPE ). 7
7Example of normal pair elimination ( NPE ). 7
8 Minimization of logic tree for positive level triggered D-Latch. 9
9Two input NAND/AND gate. 10
10Power-delay curves for 2 input NAND/AND gates. 11
11VDD current spikes for 2 input NAND/AND gates. 11
12Three input NAND/AND gate. 12
13Positive level triggered D-Latch. 12
14Positive level triggered D-Latch with preset and clear. 13
15Test structure for 39 stage ring oscillator using 2 input NAND gates. 14
16aTest structure for a cascade of 8 D-Latches. 15
16bTest structure for a cascade of D-Latches with preset and clear inputs. 15
17aBasic block diagram of a sigma-delta modulator based A/D converter. 16
17bBlock diagram of a decimation filter ( decimation factor = M ). 17
18The preliminary dependency graph ( DG ). 19
19The complete dependency graph ( conventional approach ). 19
20Conventional systolic array for decimation filter ( M = 2, N = 3 ). 20
21Dependency graph for decimation filter ( multi-rate approach ). 22
22Multi-rate processor array for decimation filter. 25
23Snapshots for the multi-rate processor array shown in Figure 22. 25
24Block diagram of the processing element ( PE ) in multi-rate array. 26
25Logic diagram of a typical TC/TC multiplier bit 29
26Block diagram of the TC/TC multiplier bit. 29
27Logic diagram of a 4 bit TCPI'C serial-parallel multiplier-adder. 30
28Block diagram of the 4 bit TC/TC serial-parallel multiplier-adder. 30
29Simulation results ( QuickSim ) for 4 bit TC/TC multiplier-adder 31
30AND-OR gate. 33
31Data Selector ( 2:1 MUX ). 3332aFull adder carry bit. 34
32bFull adder sum bit. 34
33D-Latch with data selector as D input. 35
34Simulation results ( MSPICE ) for 4-bit TC/TC multiplier-adder. 36
35Configuration of the 4 bit multiplier-adder as the PE. 38
36The complete decimation filter. 39
37Simulation results ( QuickSim ) for the complete decimation filter ( M=2 ). 40LIST OF TABLES
Table
1Summary of simulation results.
2Summary of simulation and experimental results.
3Booth's algorithm.
E2' ze
13
15
28DESIGN OF COMPLEX DIGITAL BLOCKS USING FOLDED SOURCE-
COUPLED LOGIC FOR MIXED-MODE APPLICATIONS.
1. INTRODUCTION.
The higher levels of system integration demands have led to the need of mixed-mode
analog-digital ICs. While the conventional static CMOS logic family has several desirable
features it is highly unsuitable for mixed-mode ICs due to the large overlap current spikes
generated during the switching transitions ( Wallmark[1] has summarized the noise in
digital VLSI circuits ). Figure 1 shows the VDD current spikes ( simulation results )
generated in a minimum size static CMOS inverter clocked at 20 MHz. The digital
switching noise thus generated propagates to the analog section of mixed-mode IC through
the common substrate or supply lines and severely degrades the dynamic range and phase
linearity of the system[2]. Shoji[3] attributes the generation of noise in VLSI circuits to two
distinct mechanisms: the induced noise which results due to transmission of noise voltage
from one node to another, and the power bus noise due to the current spikes through the
resistance and inductance of chip's power bus, bonding wire and package interconnects.
Several techniques have been proposed to reduce the digital noise[4-5]. These include
filtering the power busses using the on-chip active filters, active guardbanding to increase
the isolation between separate analog and digital circuitry on the same IC, and the use of
separate power and ground lines for analog and digital section. Though effective to a
certain extent at low frequencies, these techniques are virtually ineffective at higher
frequencies.
1.1 FOLDED SOURCE-COUPLED LOGIC.
The newly developed CMOS Folded Source-Coupled Logic ( FSCL ) [6-7] is aimed at
reducing the power supply line noise rather than filtering it's propagation to the analog
section of the mixed-mode ICs. FSCL employs constant current steering techniques similar
to the bipolar ECL topology as shown in Figure 2. The basic principle of this current-mode
logic is to maintain a constant current supply by using a differential pair. Because of the
constant bias current sources connected to the supply lines, the large current spikes
generated during the switching transitions are eliminated.2
0
11111111111111.M1
11111111111111111
111111111MIBINII
1111111U1 %JIM
24 25 26 27 28
Time ( nS )
29 30
Figure 1. VDD current spikes generated in minimum size static CMOS inverter.
VDD
Iss2 Iss2
MII
Iss 1
Figure 2. Basic Folded Source-Coupled Logic ( FS CL ) inverter.3
The design of the FSCL inverters and the related experimental results have been dealt with
in [7]. The advantages offered by this logic family can be summarized as :
1) Minimized power bus noise ( at least 2 orders of magnitude smaller than the
conventional static CMOS logic[7] ).
2) High frequencies ( several hundred MHz) of operation.
3) Reduced voltage swing which implies reduced dynamic power dissipation at higher
frequencies.
4) Higher noise immunity due to the differential input topology.
5) Reduced area as compared to static CMOS logic due to the reduced number of
transistors for complex digital blocks ( will be shown later ).
6) Availability of complementary outputs.
The disadvantages on the other hand are:
1) Due to the constant bias current sources connected to supply lines, the FSCL gates
exhibit a static power dissipation and that the maximum speed of operation and the digital
switching noise are a function of ( directly proportional to ) the power dissipated per gate.
2) Increased area as compared to static CMOS logic due to the greater number of
transistors for simple digital blocks.
1.2 GOALS AND OUTLINE.
The main objective of this thesis is to design, implement, and characterize a series of
complex digital blocks using FSCL. These blocks can be used to implement high-speed
and low-noise digital blocks which would be ideally suited for mixed-mode applications
like Flash A/D converters and Sigma-Delta A/D converters. The subsequent chapters in this
thesis are organized as follows:
Chapter 2 shows a systematic method to design the complex FSCL gates and minimize the
number of transistors required. Simulation and experimental results for these gates are also
presented. Chapter 3 mainly deals with the design of a decimation filter using the Multi-
Rate systolic architecture and defines the basic processing element ( PE ) required in this
configuration. Chapter 4 discusses the design and implementation of the 4 bit TC/TC serial-
parallel adder-multiplier using FSCL logic blocks. This adder-multiplier can be easily
configured as the PE of the decimation filter designed in Chapter 3.4
2. COMPLEX FSCL GATES.
Folded Source-Coupled Logic ( FSCL ) inverter topology has the advantage that it
generates extremely low magnitude of overlap current spikes ( few pA ) during the
switching transitions. It is ideally suited for mixed-mode IC applications where the digital
noise has an adverse effect on the performance of the overall system. The main limitation
with FSCL, however, is the static power consumed due to the connection of the constant
current sources to the supply lines. To achieve a reasonable speed ( about 200 MHz ), the
power dissipated per FSCL gate is about 1 mW[7]. Thus a digital section designed with
over several thousand FSCL gates will imply a significant overhead on power consumption
and may well need special high-cost packaging.
The differential switching strategy used in FSCL gates is similar to the one employed in
Differential Current-Mode Logic ( DCML ) gates[8] which are usually implemented in
bipolar technology. The approach used to reduce the power consumption in these circuits is
the series-gating technique[8]. Using this technique, a complex logic function, instead of
being realized with several primitives such as FSCL NAND gates, can be realized within a
single gate. Thus besides reducing the power consumption, this technique also reduces the
number of transistors required to realize the complex logic functions and hence the area
occupied by these blocks on silicon.
2.1 SERIES-GATING TECHNIQUE.
The basic FSCL inverter is as shown in Figure 2. It comprises of : the bias current sources
Issl and Iss2, the output PMOS diode loads ML1 and ML2, and the input differential pair
formed by MI1 & MI2. To understand the series-gating technique, the input differential
pair should be viewed as a one variable logic tree in which the 21 branches are formed by
the switching transistors. Each branch closes ( allows the current to pass ) under logic high
condition. A complex logic function can be realized in FSCL by changing merely the logic
tree and keeping the bias currents and the output diode loads same. Figure 3 shows a
generalized FSCL structure to implement any complex function. Figure 4 shows a logic
tree of 2 variables. Note that the switching transistors are shown as circles in the logic tree
for convenience. Since the current Issl should be eventually drawn from either of the two5
VDDr
Iss2
Inputs
Iss2
Logic Tree
Issl
'Input:
Figure 3. A generalized FSCL structure for a complex logic function.
QI
NAND) '72. (AND)I -------
Issl
Truth Table
ABNAND
00 1
0 1 1
10 1
1 1 0
Figure 4. A two variable logic tree ( --- path terminations for NAND/AND function ).6
sources Iss2, to complete the logic tree, it is only a question of terminating each path
specified by the truth table to either output terminal Q or Q according to the function being
implemented. Also shown in Figure 4 are the truth table for 2 input NAND/AND function
and the path terminations to implement the same. As can be observed, the rule for the path
termination in logic tree is : connect the path which has '1' entry in the corresponding truth
table to the inverted output Q and conversely, the path with '0' entry in the truth table to Q
output terminal. As can be seen from Figure 4, the logic tree to implement a 2 variable
function by series-gating technique requires 6 switching transistors. In general, if number
of input variables = n, then the number of transistors required in the logic tree would be
i( 2i).By eliminating the redundancies in the logic tree, using the minimization
i=1
procedure described below, the total number of the switching transistors can be reduced
considerably.
2.2 MINIMIZATION PROCEDURE.
A structural characteristic of the logic tree is that a lower branch always splits into two sub-
branches, and the sub-branches split further in the same manner. If two sub-branches
eventually terminate at the same output terminal, then these 2 sub-branches are redundant.
This is shown in Figure 5a. Redundancy also exists if an equivalent path can be found for a
certain input state. This is demonstrated in Figure 5b.
( a )
QI
( b )
Figure 5. 5. Examples of redundancy in a logic tree.
Based on this observation, the elimination method can be formalized as follows:7
2.2.1 COMPLEMENTARY PAIR ELIMINATION ( CPE ).
A pair of branches that arise from the same parent branch are termed as the complementary
pair. If the path terminations are identical leading from the complementary pair, then the
paths concerned can be minimized by eliminating one of the identical path groups arising
from the complementary pair, and the complementary pair itself. Figure 6 shows this
process.
r-=- E 1
1 1 10 0
1 10 01
Figure 6. Example of complementary pair elimination ( CPE ).
2.2.2 NORMAL PAIR ELIMINATION ( NPE ).
Any pair of branches arising from different parent branches at the same level are called the
normal pair. If the path terminations are identical leading from a normal pair, one of the
identical path groups can be eliminated by joining up the normal pair concerned. Figure 7
shows the process of NPE.
Figure 7. Example of normal pair elimination ( NPE ).8
The complete minimization process of a logic tree can be thus described as a systematic
search in the original logic tree to identify any possibility of CPE and NPE. The search
should start from the top level and must continue down the tree. At each level, every
complementary pair and normal pair should be examined to check for identical path
terminations and either CPE or NPE may be applied as appropriate. Successive application
of this procedure will eliminate all the redundancies in the original logic tree and will lead to
a unique minimal circuit. It should be noted that CPE and NPE may be performed
independently of each other. Figure 8 shows the application of the above procedure to
derive the minimal logic tree for implementation of positive level triggered D-Latch.
2.3 MERITS AND DEMERITS OF SERIES-GATING TECHNIQUE.
The main advantage of series-gating technique results from the fact that a complex function
can be implemented as a single gate, rather than being implemented from a number of
NAND primitives. This implies considerable power and hardware reductions. Also this
technique leads to an improvement in performance because of the reduction of propagation
delay of a vertically stacked complex function as compared to an equivalent implementation
in several levels of primitive gates.
The disadvantages of series-gating technique, on the other hand, are that the increase in the
speed may only be realized in full if switching is effected at the top level. If switching is
activated at lower levels, then there will be a vertical delay path up the levels in addition to
the output delay. Also, realizing complex functions using this technique means that there
will be a unique circuit and a corresponding unique layout for each different gate
implemented.
With more levels of series gating, the improvement will be more significant because more
complex functions can be implemented as a single gate. However, more levels of vertical
stacking would increase the threshold voltages of the switching transistors higher up in the
stack, thus requiring higher supply voltage to accommodate them and consequently lead to
an increase in power consumption per gate. Taking into consideration the power supply
limitation ( VDD = 5 V ), the number of input variables for a complex FSCL function was
restricted to three to ensure the reliable switching of the FS CL gates.9
Truth Table
CkQnDQn+1 = Q
000 0
00 1 0
0 10 1
0 1 1 1
100 0
10 1 1
1 10 0
1 1 1 1
Q
LEVEL 1 CPE
LEVEL 2 CPE
Initial logic tree
Q
Figure 8. Minimization of logic tree for positive level triggered D-latch.10
2.4 SIMULATION RESULTS.
Figure 9 shows the two input NAND/AND gate designed using the series-gating technique.
A series of 2 input NAND/AND gates with different sizes ( different power consumption
per gate ) were simulated using SPICE for fanouts of 1,3 and 5. Figure 10 shows the
power-delay curves obtained from these simulation results. As can be seen, the delay of the
FSCL NAND/AND gates is a strong function of the static power consumed per gate. The
delay per gate decreases with increase in power consumed per gate because the increase in
power consumption implies larger bias currents to charge the capacitance at the output
nodes. Figure 11 shows the VDD current spikes for different sizes of 2 input NAND/AND
gates with different fanouts. They are of the order of few almost two orders of
magnitude smaller than the corresponding CMOS gates. As expected, the magnitude of the
current spikes increases with increase in the bias currents ( i.e. power consumption per
gate ). The power-delay-curves for different fanouts and the current spikes diagram enable
the circuit designer to select the power consumption per gate so as to obtain optimum
combination of speed and digital noise appropriate for the given application.
NAND
VDD
Iss2
A>1
Iss2
Figure 9. Two input NAND/AND gate.
AND11
14
12 FO 1-04FO 3
FO 5 ----1)
.
:
--,....,,
ci
o-,
0 1 2
Power ( mW ) per gate.
3
Figure 10. Power-delay curves for 2 input NAND/AND gates ( different fanouts ).
10
8
6
4
2
0
0.25 0.5 1 2.5
Power ( mW ) per gate
2
Figure 11. Simulated VDD current spikes for 2 input NAND/AND gates.
The series-gating technique was also used to design more complex logic functions like 3
input NAND/AND gates, positive level triggered D-latch, and positive level triggered D-
latch with preset and clear inputs. These gates are shown in Figures 12 through 14.oil)Iss2
NAND
VDD1
Iss2
8)-1 kE
1.s.si a)
Figure 12. Three input NAND/AND gate.
VDDT.
(t)Iss2 Iss2
.11
D>-I
-..
AND
kE. --I
kmf
,
Figure 13.13. Positive level triggered D-Latch.
1213
VDD
Iss2 Iss2
PR
Iss 1(I)
Figure 14. Positive level triggered D-Latch with preset and clear.
Size # 1 :- P. 0.5 mW/gate.
Circuit Ck Tpair Trise TfallVDD spike
31/P NAND-- 5.9 nS10.3 nS9.1 nS5.0 ;IA
D-Latch 20 MHz6.0 nS10.2 nS9.6 nS3.7 pA
D-Latch
(Pr&C1r)
20 MHz6.4 nS10.3 nS9.6 nS 4.7 IAA
Size # 2 :- P. 1.0 mW/gate.
Circuit Ck Tpair Trise TfallVDD spike
3 I/P NAND 3.9 nS 7.4 nS 6.2 nS 8.11.LA
D-Latch 20 MHz4.0 nS 6.8 nS 6.3 nS7.2 !IA
D-Latch
( Pr & Clr )
20 MHz4.3 nS 7.0 nS 6.3 nS7.1 pA
Table 1. Summary of simulation results.14
These were simulated for the sizes of 0.5 mW/gate and 1 mW/gate using SPICE, the
results being tabulated in Table 1. The simulation results show that for given power
consumption per gate, the VDD current spikes also depend on the complexity of the
function being implemented. Again, the magnitude of the current spikes is in the range of
few 1.1.A.
2.5 EXPERIMENTAL RESULTS.
To experimentally verify the simulation results obtained, several structures using FSCL
gates have been fabricated using the MOSIS 2 gm p-well CMOS process. In particular, 39
stage ring oscillators using 2 input NAND/AND gates for two different sizes have been
fabricated. The test structure of these ring oscillators is as shown in Figure 15.
VDD Output 1_Q2
Buffer
Output
Buffer
Q13
3 380)-39-)go-
1
Figure 15. Test structure for 39 stage ring oscillator using 2 input NAND gates.
To characterize the D-latches, a cascade of 8 D-latches for two different sizes has been
fabricated, the test structure being shown in Figure 16a. A cascade of D-latches with preset
and clear inputs for two different sizes has also been fabricated, with the test structure as
shown in Figure 16b
Note that in each of these test structures, the ground line of one of the gates has been
separated from the rest to enable us to measure the switching noise of the single complex
gate. The simulation and the experimental results have been tabulated in Table 2. As can be
seen from this table, the simulation and the experimental results show considerable
agreement. Slight discrepancies can be attributed to the process parameter variations and the
parasitic capacitances associated with the circuits due to the layout style.VDD
Output
Buffer
Q0 Output
Buffer
Q2 Output
Buffer
Q4
Q7
15
Output
Buffer
D
Ck
DQ
Ck
DQ
Ck
DQ
Ck
DQ
Ck
DQ
Ck
DQ
Ck
D Q
Ck
VDD
--T--
DQ
Pr
Clr
Ck
Figure 16a. Test structure for a cascade of 8 D-Latches.
Output EQ0
Buffer
D QD
Pr Pr
Clr Cl
Ck Ck
OutputQ2 OutputQ4 Q7
Ck
Output
Buffer Buffer Buffer
L
I
I QDQ
Pr
Or
Ck
D
Pr
Clr
Ck
Q D
Pr
Clr
Ck
QDQ
pr
Cl
Ck
D
Pr
Clr
Ck
Q
Ch.
Ck(
(Pr
Figure 16b. Test structure for a cascade of D-latches with preset and clear
inputs.
Size # 1 :- P -, 0.5 mW/gate Size # 2 :- P = 1.0 mW/gate
Circuit Tpair ( Sim. ) Tpair ( Expt. ) Tpair ( Sim. ) Tpair ( Expt. )
39 Stage Ring Osc.
( 2 1/P NAND gates )
4.8 nS 4.32 nS 3.1nS---
D-Latch 6.0 nS 7.80 nS 4.0 nS 4.28 nS
D-Latch with
Preset and Clear.
6.4 nS 6.28 nS 4.3 nS 5.00 nS
Table 2. Summary of simulation and experimental results.16
3. DECIMATION FILTER USING MUTLTI-RATE ARRAY.
The conventional A/D converters have aimed at achieving high resolution by using
techniques like successive approximation or dual ramp conversion. These techniques
demand complicated calibration, anti-aliasing filters, accurate comparators and complex
sample-and-hold circuits to increase the resolution over 15 bits. Sigma-Delta converters, on
the other hand, do not require the anti-aliasing filters and the sample-and-hold circuits. The
basic block diagram of a Sigma-Delta converter system is as shown in Figure 17a.
Analog \
Input /
ANALOG
MODULATOR
LOW PASS &
DECIMATION
FILTER
Resolution = 1 Bit
Sampling Frequency = 2-3 MHz
Digital
Output
Resolution = 15-16 Bits
Sampling Frequency = 20-30 KHz
Figure 17a. Basic block diagram of a sigma-delta modulator based A/D converter.
It comprises of two major sections: 1) Analog Modulator which forms the front end of the
system and 2) the digital low pass and decimation filters. The analog modulator quantizes
the input signal with very low resolution ( 1 bit ) and at a very high sampling rate( few
MHz ). The oversampled signal is then fed to a low pass filter to remove the out of band
noise. The low pass filter is followed by a series of decimation filters to reduce the output
sampling rate and to increase the resolution to 16 bits.
The main objective of this thesis is to design the decimation filter for a mixed-mode system
like Sigma-Delta A/D converter using novel multi-rate systolic architecture and consider it's
implementation in Folded Source-Coupled Logic. With this in mind, this chapter is devoted
to mapping of decimation filter onto a multi-rate array. Chapter 4 considers the
implementation of the same in FSCL.17
3.1 CONVENTIONAL APPROACH.
Most important application of systolic arrays is in the area of real-time digital signal
processing as they offer important advantages of modularity, regularity, local inter-
processor connections and highly pipelined processing. The procedure to map the DSP
algorithms onto systolic arrays is to specify the algorithm in form of a system of recurrence
equations ( SREs ). Using SREs as the initial specification, a data dependency graph ( DG
) showing the dependencies between the various intermediate data is obtained. After
localizing the DG, it is mapped onto a processor array using the appropriate projection
vector and the scheduling ( timing ) function. Important feature of the conventional systolic
arrays is that both the transparent and the computed data dependencies are transmitted with
equal delay time.
Figure 17b shows the block diagram of a decimation filter with decimation factor M.
h(j)
YO)
Figure 17b. Block diagram of a decimation filter ( decimation factor = M )
The algorithm for such a filter can be expressed as,
N
y(i) = 1 h(j).x(Mi - j)
J=0
Consider the case when N=3 and M=2. The recurrence equation ( RE ) corresponding to
the above algorithm is,
y(i,j) = y(i,j-1) + h(j).x(2i - j ) 0 <= j <= N
y(i,-1) = 0 ;y(i,N) = y(i) initial & final condition
The preliminary dependency graph ( DG ) for the above RE can be obtained directly and is
as shown in Figure 18. From the preliminary DG, one can infer that h(j) can be pipelined
along the 1-axis, thus implying a dependency of the form,
h(i,j) = h(i-1,j)
with the initial condition h(-1,j) = 0.18
The above DG is incomplete in that the propagation of variable x is not shown. The
simplest way of propagating x is along the direction specified by 2i - j = C. This implies
that the node p(i,j) in the computation domain gets x from node (i-1,j-2). This is so because
the node (i-1,j-2) preserves the relationship 2.(i-1) - (j-2) = C and is also the nearest node
around p(i,j). The new dependency imposed by propagation of x in the 2i - j = C direction
is, x(i,j) = x(i-1,j-2)
In general, for a decimation factor of M, this dependency would be,
x(i,j) = x(i-1,j-M)
The complete DG can now be obtained and is as shown in Figure 19. The complete system
of recurrence equations ( SREs) for a decimation filter can now be written as,
y(i,j) = y(i,j -1)
h(i,j) = h(i -1,j)
x(i,j) = x(i-1,j-2)
3.1.1 DEPENDENCY VECTORS.
The dependency vectors for the above SREs are given by,
10 0
Dyy: Al =
01
b 1 =
-1
; computed data dependency.
10 -1
Dhh: A2 = b2 =[ ; transparent data dependency.
01 0
10
Doc: A3 = b3 =[ ; transparent data dependency.
01 21
3.1.2 PROJECTION VECTOR (d).
.
Because the computation domain for the given algorithm has a ray indirection ( meaning
that it is infinite in that direction ), therefore the only possible projection direction is T and
hence,
=
The corresponding processor space specified by S2 is orthogonal to 71 and is given by19
Figure 18. The preliminary dependency graph ( DG ).
\1 K
Ne
y(1)\ y(2) y(3) y(4) y(5)
\ 0
7
i
x(0)x(1)x(2)x(3)x(4)x(5) x(6)x(7)x(8) x(9)x(10)
Figure 19.19. The complete dependency graph ( conventional approach ).20
The allocation function a(p) which defines the mapping of the nodes in the computation
domain onto the processor space is given by,
a(p) =T.p=j
Thus node p(i,j) in DG would be mapped onto the processing element ( PE ) at j in the
processor space.
3.1.3 SCHEDULING FUNCTION ( 7 ).
A scheduling functionsspecifies the time sequence in which the various nodes in a DG
will be computed by the processor array.
It can be verified that7 1 s
1
defines a valid timing function as follows:
1
sT.d= [11 .[ =1,
0
which is greater than 0 thereby implying a conflict-free schedule.
The timing function t(p), which specifies the time at which the node p(i,j) is computed, is
given by,
. t(p) = s--T.p=1-1-J
With allocation function a(p) and timing function t(p) as defined above, it is possible to
map the DG of a decimation filter onto a processor array shown in Figure 20.
x(odd)
x(even)
0
PEO
h(0)
PE1
h(1)
PE2
h(2)
110-
x(even)
PE3
h(3)
x(odd)
Figure 20. Conventional systolic array for decimation filter ( M = 2, N = 3 ).
y21
3.2 DEMERITS OF THE CONVENTIONAL APPROACH.
Two important points have to be noted from the DG and the processor array thus obtained.
First, the DG is not completely localized ( because node (i,j) depends on node (i-1,j-2) )
and hence the corresponding processor array is not completely systolic. Next, the odd and
even indexed x variables propagate along separate I/O paths in the processor array. This
implies that for a decimation factor of M, the corresponding processor array will have M
I/O paths to propagate the variable x. Consequently the processor array thus obtained is
impractical for large decimation factors. Another disadvantage of this array is that the
transparent data dependencies ( Dxx & Dhh ) and the computed data dependency ( Dyy )
are transmitted at same rate. This implies a reduction in the overall computational speed.
Mapping of decimation filter onto multi-rate array which eliminates all these disadvantages
is considered next.
3.3 MULTI-RATE ARRAY APPROACH.
As mentioned before, in conventional systolic arrays both the transparent and computed
data dependencies are transmitted with equal delay times. For certain DSP algorithms like
decimation filters, the corresponding recurrence equations ( REs ) are directionally uniform
( DUREs ). For such applications, the general systolic array method of treating all the data
dependencies in similar fashion introduces a large latency and degrades the total
computation time. The computation time and hence efficiency for these algorithms can be
improved significantly by implementing them as multi-rate arrays[9]. These arrays are a
compromise between the global broadcast arrays and the conventional systolic arrays. In
this scheme the transparent data dependencies are transmitted K times faster than the
computed dependencies, thus reducing the latency and computation time by a factor of K.
3.3.1 DIRECTIONALLY UNIFORM RECURRENCE EQUATIONS.
Consider any algorithm for which the recurrence equation is of the form,
y(p) = f( yl(q1), y2(q2), y3(q3),
The dependencies qj can be expressed as,
qj = Aj.p + bj
The above recurrence equation is directionally uniform iff for all of it's affine dependencies
the following holds true,22
[ Aj - I] is singular (i.eI Aj - I I = 0 )
and in that case it would be termed as Directionally Uniform Recurrence Equation (DURE).
3.3.2 MULTI-RATE ARRAY FOR THE DECIMATION FILTER.
The basic recurrence equation for the decimation filter ( assuming M = 2) is,
y(i,j) = y(i,j-1) + x(2i -j).h(j)
The recurrence equations for the given algorithm can also be expressed as,
y(i,j) = y(i+1,j-1) + x(2i,j).h(i,j)
h(i,j) = h(i-1,j)
x(i,j) = x(i+1,j-1)
; implies affine dependency Dxy.
; implies uniform dependency Dhh.
; implies uniform dependency Dxx.
Above REs imply the propagation of h along the j direction and that of x and y along the ( j
i) direction. The DG for above REs is as shown in Figure 21.
i
y(2)\(3)
h(3)
y(4) y(5) y(6)
y(0\h(0)
Figure 21. Dependency graph for decimation filter ( multi-rate approach ).23
3.3.3 DEPENDENCY VECTORS.
The dependency vectors for the above SREs are given by,
Dyy: Al =
10
01
b1 =
-1
; uniform data dependency.
Dxy: A2 =
2 0
b2 = ; affme data dependency.
01 00
10 -1
Dhh: A3 = b3 = ; uniform data dependency.
01
Dxx: same asDyy
Consider the affme dependency Dxy for which
[A2-I]=
10
ObviouslyI A2 - I I = 0 and hence [ A2 - I ] is singular and therefore the given system of
REs for the decimation filter algorithm falls under the class of DUREs. The general
synthesis methodology for mapping such algorithms onto multi-rate arrays[9] can now be
used in our case to determine the projection direction and the timing function.
3.3.4 UNIFORM HYPERPLANES ( C ).
Because [ A2 I ] is singular therefore it can be expressed as,
1 [A2-I].1.1..x.r..,[
0
.[ 10]
Uniform hyperplanes are the planes along which the dependencies are constant and are
given by,24
C = xT.p= i
Thus in our case the dependencies are constant along the planes i = C ( i.e. the planes
parallel to the j-k plane ).
3.3.5 PROJECTION VECTOR ( Ti ).
The processors for the multi-rate array have to be projected along the direction specified by
11 onto the uniform hyperplanes defined by C. The projection direction d is therefore
given by,
1
d = il =[
The processor space SI is parallel to the planes specified by the uniform hyperplanes i = C
and orthogonal to the projection direction d . The processor allocation function a(p) is
given by,
a(P) = E2TP= i
3.3.6 MULTI-RATE CLOCK RATIO ( K ).
The multi rate clock ratio K, defined as the ratio of the speed at which transparent
dependencies are transmitted to that of the computed dependencies, is given by,
K = 1 + w. p,--T= 2
Thus in our example the transmitted variable x will be propagated 2 times faster than the
computed variable y. In general for a decimation factor of M, the multi-rate clock ratio will
satisfy the relation, K = M.
3.3.7 TIMING FUNCTION.
Let the timing function associated with variable y be Ty and that with variable x be Tx.
Because the transmitted variable x is propagated K times faster than the computed variable
y therefore the relationship between Ty and Tx is,
Ty = K.Tx
The directions of the dependency arcs in Figure 21 do not allow the choice Ty = i. Note,
however, that because the dependency Dyy is an addition operation and Dxx a propagation
operation, the direction of dependency arcs for both variables can be reversed using the25
associative property.
Thus if, Ty = i,then Tx = i/K.
Hence in our case, Ty = i and Tx = i/2.
Having defined the projection direction and the timing function, the dependency graph for
decimation filter shown in Figure 21 ( with dependency arcs for variables y and x reversed
) can be now mapped onto a multi-rate array as shown in Figure 22. Figure 23 shows some
snapshots for this array.
x
0
CK1
PEO
h(3)
PE1
h(2)
PE2
h(1)
0(2
x(0)
PE3
h(0)
Figure 22. Multi-rate processor array for decimation filter.
Timet =0
0
x(2)
PEO
h(3)
Time t = 1
0
x(4)
Time t = 2
0
PEO
h(3)
PEO
h(3)
0
0
x(1)
x(3)
PE1
h(2)
PE1
h(2)
PE1
h(2)
0
0
x(0)
x(2)
PE2
h(1)
PE2
h(1)
0
0
0
--111.- X
PE3
h(0)
PE2
h(1)
PE3
h(0)
PE3
h(0)
y
0
0
0
0
x(0)
y(0)
Figure 23. Snapshots for the multi-rate processor array shown in Figure 22.26
4. IMPLEMENTATION OF THE PROCESSING ELEMENT USING FSCL.
The multi-rate array to implement the decimation filter with decimation factor M was
considered in the previous chapter. As was seen, this array comprises of a number of
identical processing elements ( PEs ) connected together in a systematic fashion. Each of
these PEs is basically a multiplier-adder block as shown in Figure 24. Each of these PEs
perform the operation yout = yin + ( h.x ), where h denotes the coefficient word and x the
data word.
(CK1)
yin
( CK2 )
coefficient ( h )
yout
Figure 24.24. Block diagram of the processing element ( PE ) in multi-rate array.
In this chapter the design and implementation of this multiplier-adder block using FSCL
logic gates is considered. The multiplier-adder thus designed is then configured as the
complete PE of the multi-rate array for the decimation filter.
Two possible ways to implement the multiplier-adder are the parallel-parallel approach and
the serial-serial approach. Parallel implementation would involve an array type multiplier
configuration and a fast carry-look-ahead adder. It would feature high speed of operation
but at the expense of increased hardware. Thus if we consider to realize the parallel
implementation of multiplier-adder using FSCL gates then it implies an increase in area as
well as the power consumption per bit of multiplication. Serial implementation, on the other
hand, would allow the savings in hardware, area and power consumption but at the27
expense of the speed of operation. Taking into consideration these factors, a serial-parallel
implementation of multiplier-adder block was chosen. Also taking into account the 2's
complement format used to represent the data ( x ) and the coefficients ( h ), Booth's
algorithm was chosen to perform the multiplication operation.
4.1 DESIGN OF A MULTIPLIER-ADDER.
Multiplication is usually implemented as some form of repeated additions. The operations
involved are shifting and addition. One approach would be to multiply the data ( x ) with
the coefficient (h) one bit at a time and adding the resulting terms. In our application, a
real-time signal processing application, high speed multiplication is desirable. Moreover,
the conventional method of multiplication presents some difficulties in treating the negative
numbers represented in 2's complement format. Booth's multiplication algorithm[10] for
2's complement numbers treats the positive and negative numbers uniformly and thereby
eliminates the need for correcting the result. Instead of multiplying the data ( x ) by one
coefficient ( h ) bit during each cycle, it multiplies by 2 coefficient bits. The method is
equivalent to recoding the coefficient ( h ) as a sequence of ternary digits ( -1, 0, +1) with
binary weighting. If we denote the sequence of bits in data x as,
X(k-1) X(k-2) X(2) X(1)
and the sequence of bits in coefficient h as,
H(k-1) H(k-2) H(2) H(1)
then the recoded sequence of bits in coefficient h is,
T(k) T(k-1) T(1) T(0)
where, T(i) = H(i-1)H(i) is one of the ternary digits ( -1, 0, +1 ).
Multiplication proceeds by adding the negative, zero, or positive partial products formed by
multiplying each T(i) by the data word. Table 3 enumerates the possibilities. The additions
must, of course, done in 2's complement arithmetic, using sign extensions when needed.
The table suggests that H(i) can be used to determine whether to add or subtract, while H(i)
XOR H(i-1) can be used to gate the data word. The complete action of add/subtract
data/zero is shown in Table 3 and can be implemented as,
H(i) XOR ( [ H(i) XOR H(i-1) ] AND X(j) )
which reduces to,
[ X()) AND H(i-1) ] OR [ X' (j) AND H(i) ]
As can be seen, the above expression can be easily realized using a data selector. Also to
correctly implement the subtraction in 2's complement form we need to add the bit H(i) to28
the above expression during the LSB computation. This can be taken care of by presetting a
carry-save flip-flop to H(i). A single multiplier bit module which will perform the
multiplication operation assuming the data x as serial input and the coefficient h as parallel
Original
multiplier
Recoded
multiplier Gating Action
H(i) H(i-1) T(i) H(i) XOR H(i-1)
0 0 0 0 add zero
0 1 +1 1 add data word
1 0 -1 1 subtractdata word
1 1 0 0 subtract zero
Table 3. Booth's algorithm.
input is as shown in Figure 25. Several comments about this multiplier bit module are in
order. As was described above, two coefficient bits H(i) and H(i-1) are needed per bit of
TC ( 2's complement ) / TC multiplication. Data bits X(j) select H(i) or H(i-1) through data
selector DS1 as prescribed by the Booth's algorithm to form the gated data. A 1-bit full
adder combines the gated data, a previous partial product Pp, and the saved carry from the
previous addition. The resulting sum is saved in a partial sum flip-flop and the resulting
carry in the carry-save flip-flop. A data selector DS2 preceding the carry-save flip-flop
allows the presetting of the carry to a logical 1 or 0 under the control of the signal R. A data
selector DS3 preceding the partial sum flip-flop allows sign extensions to be performed
under the control of R. PO denotes the partial product bit generated by the multiplier bit
module. Figure 26 shows the block diagram of such a TC/TC multiplier bit. The main
advantage of such multiplier bit modules is that N of these can be cascaded in series to form
a 2's complement N bit/N bit multiplier. Of course a series of D flip-flops would be
required to synchronize the data word x and the control signal R to ensure correct
operation. Another advantage of generating the product serially is that the implementation
of the operation yout = yin + ( h.x ) merely requires a bit by bit serial addition of the word
yin with the product h.x.Figure 27 shows a complete 2's complement 4 bit/4 bit
multiplier-adder implemented using this scheme. Note that the data x ( 4 bit ) is fed serially,
the coefficient h ( 4 bit ) is applied in parallel and the final result yout = yin + ( h.x ), an 8CK
R
2 9
DS3Isft
DS1
Carry
Q
Figure 25. Logic diagram of a typical TC/TC multiplier bit.
PP
I
PO
R CK
I I
Figure 26. Block diagram of the TC/TC multiplier bit.CK1>
D D
HI
2
H(2)
V
QD
H(3)
V
PO
R CK
PP
D
xoutc
PO
CK
PP
CK
H(4)
V
PP
xX
2
CK
C DDC DD Q D
Yin - D Q-
CK2
D Q*-DD Q-D Q-'D
Q D
30
yout
Figure 27. Logic diagram of a 4 bit TC/TC serial-parallel multiplier-adder.
em,
xin xout
yin
MULT_4
U
yout
U
Figure 28. Block diagram of the 4 bit TC/TC serial-parallel multiplier-adder.t_f1FU1.Rf1..f1.f1,.aF1_f1_f-Lfi_f-1Rf1_R.i"LiU1.I-
f1.11J"1fIf1.f'
Oa,
1...1
IG,T1114
f
+
+ -t.
ck 1
ck2
clr
A
H (1)
H (2)
H (3)
H (4)
xin
xout
yin
yout
120.0 240.0 360.0 480.0 600.0
Figure 29. Simulation results ( QuickSim ) for 4 bit TC/TC multiplier-adder.
CA)32
bit number is generated serially. To verify the operation of this multiplier-adder simulations
( gate level ) for various combinations of x, h and yin were performed using QuickSim.
Figure 29 shows the simulation results for the case when x = 0 1 0 1 ( = + 5 ), h = 1 00 1
( = - 7) and yin = 1 1 1 1 1 0 1 0 ( = - 6 ). As can be seen, theresult yout = 1 1 0 1 0 1 1 1
( = - 41 = yin + h.x ) is generated 5 clock cycles from the instant the x and yin inputs are
applied. In general, for a N bit/N bit multiplier-adder designed as above, the LSB of the
result yout will be obtained after ( N + 1) clock cycles from the instant the LSB's of the
inputs are applied.
4.2 IMPLEMENTATION OF MULTIPLIER-ADDER IN FSCL.
As can be seen from the logic diagrams in Figures 25 and 27, the primary digital blocks
required to realize the multiplier-adder thus designed are an AND-OR gate, a data selector,
a full adder, and D flip-flops ( edge triggered ). Implementation of each of these blocks
would normally require multiple primitive logic gates. Using the series-gating technique
outlined in chapter 2, however, each of them can be realized as a single complex logic gate
and are as shown in Figures 30 through 32. Note that the multiplier bit module ( Figure 25
) features a D flip-flop preceded by a data selector at the D input. The data selector and the
D flip-flop can also be combined into a single complex FSCL gate as shown in Figure 33
using the series-gating technique.
As was observed from the power-delay curves for the FSCL gates ( Chapter 2 ), the speed
of the FSCL gates does not show any significant improvement for power consumption
greater than 1 mW/gate. The complex FSCL gates required to realize the multiplier-adder
were thus sized so that the power consumption is 1 mW/gate. The total power dissipation
of a 4 bit/4 bit multiplier-adder realized using FSCL gates is86 mW. To verify the
operation of the multiplier-adder implemented in FSCL, simulations ( transistor level ) were
performed for various combinations of x, h and yin using MSPICE. Figure 34 shows the
simulation results for the same combination of x, h and yin as in Figure 29. As can be seen
the result yout generated in both the cases ( gate level simulations and the transistor level
simulations ) is identical. Simulation results show that the magnitude of the VDD current
spikes generated for the multiplier-adder is 220 RA, which is considerably lower than that
generated by a minimum size static CMOS inverter ( 500 RA ). The 4 bit/4 bit multiplier-
adder thus designed and simulated has been fabricated using the MOSIS 2µm p-well
process. The chip area occupied by this structure is 1410R x 1532g...1
A*B + C
Iss2 Iss2
P-I i<B
Iss1 oCI)I
Iss2
IMIIIIIMM
Figure 30. AND-OR gate.
VDD
00-
A*B + C
1
Iss2
n 1 s>d I-<-s-
Figure 31. Data selector ( 2 : 1 MUX ).
33is
CARRY
VDD
1
Iss2 Iss2(i)
ni-
1T-1
1 F<B-i>-I
Iss 1(!)
Figure 32a. Full adder carry bit.
VDD
1
(12)Iss2 Iss2
Alf
SUM
ni-
Figure 32b. Full adder sum bit.
or
CARRY
SUM
34VDD
Iss2 Iss
Al
AO
Iss
_1
Figure 33 D-Latch with data selector as D input.
35
CLR
\./
F9
C37
4.3 MULTIPLIER-ADDER AS THE PE FOR DECIMATION FILTER.
The multiplier-adder as shown in Figure 27 cannot be used directly as the PE for the
decimation filter. The synchronization of the inputs and outputs is necessary so that the
resultant multi-rate array can process the incoming serial data stream correctly.
Synchronization of the inputs and outputs implies that the LSB's of the inputs ( outputs )
should enter ( leave ) PE at the same time instant. The time interval between these time
instances is termed as the latency ( L ) of the PE. To determine the number of flip-flops to
be appended at the outputs of the multiplier-adder ( Figure 27, 28 ), the following needs to
be observed from the input/output waveforms as shown in Figure 29 ( or Figure 34 ):
1) Delay between xin and yin is 0.
2) Delay between xout and yout is 2 clock cycles.
3) Delay between the control signal R and xin is 3 clock cycles.
Other factors to be considered are:
1) yout is an 8 bit result.
2) Multi-rate array derived for the decimation filter requires that the transparent data x to be
transmitted K = M times faster than the computed data y to ensure correct operation and
high throughput.
3) The control signals should be periodic to facilitate practical implementation of the filter.
With the above factors in mind, it was determined to append 11 flip-flops at the x output
and 13 flip-flops at the y output. Figure 35 shows the logic diagram of the complete PE
thus obtained. It can be easily verified that the latency L, as defined earlier, for such PE is
19 clock cycles.
The load signal ( LD ) in conjunction with the clock and some additional logic facilitates the
serial loading of the coefficients h into the PE. This allows the user to design a filter with
programmable coefficients and also reduces the number of I/O pins required for the
practical implementation of the same. Note that the loading of the coefficient into the PE
needs to be done only once during initialization, and that it does not affect the latency of the
PE. Also note that signal CK2 is generated as,
CK2 = ( CK1 ) AND ( Phiy ).
Phiy is a periodic signal which allows the user to set the decimation factor M of the filter.
In general the period of Phiy = M.L, where M is the desired decimation filter and L the
latency of the PE.1-
h>
LD >
CK1>
xin
yin >
Q---D D
xin
yin
IMULT_4
Phiy>
L
xou t
DDD Q.DD Q.D Q,D Q.D Q.D Q.D
11111:11111_1_111
Figure 35. Configuration of the 4 bit multiplier-adder as the PE.
xo ut
you t39
The PEs thus designed can be directly cascaded to form a fully programmable decimation
filter.
4.4 THE COMPLETE DECIMATION FILTER.
Figure 36 shows the logic diagram of the complete decimation filter implemented in form of
a multi-rate array using the PEs thus designed. The outputs yout generated by each PE
needs to be 'cleaned' by a 'mask' signal so that the next stage receives only the correct
serial stream of bits. The decimation filter thus designed is fully programmable as it allows
the user to set the decimation factor M, the coefficients h, and the number of bits N used
for data representation.
xin> xin
yin Yin
aA
R>
LD>
Phiy)
CK
mask>
xout
yout
xin xout
yin yout
g .aU
h(0)
xin xout
yin yout
P4Aa6 yout
Figure 36. The complete decimation filter.
Figure 37. shows the simulation results ( QuickSim ) for the decimation filter with
following specifications,
h(0) = h(1) = h(2) = 1 1 0 1 = - 3
x(0) = x(1) == 1 1 1 1 = -1
decimation factor = M = 2.
Input data x is fed periodically every 19 ( L ) clock cycles, and the output yout is generated
serially every 38 ( 2.19 = M.L ) clock cycles.
The decimation filter thus designed can process the data at the rate r given by,
r = Fmax / ( L ), where Fmax is the maximum operating frequency of the logic
circuits used to realize the filter.4410'1,1"Ill
,1:11; ! )11II ,,i,,, },, II,' h',11 I"11'1'11111,,I.'0,1111
111 I '111,6"I I CII 0,1111,I
4
4
Il'.11111111;
I
, 1,11 11$01111,0 11,} 1,1I r1,1,11lt 111
4 4
4
J 1-1
ckl
clr
LD
h (2)
h (1)
h (0)
phiy
ck2
R
xin
xout
gout
mask
570.0 1140.0 1710.0 2280.0 2850. 3420.0 3990.0 4560.0 5130.0 5700.0
Figure 37. Simulation results ( QuickSim ) for the complete decimation filter ( M=2).41
5. CONCLUSIONS.
The widely used static CMOS logic family is highly unsuitable for the mixed-mode
applications due to the large overlap current spikes generated during the switching
transitions. The switching noise thus generated is coupled to the analog section of the
mixed-mode IC through common substrate and severely degrades the dynamic range and
phase linearity of the overall system.
In this thesis, a series of complex digital blocks have been designed using the newly
developed Folded Source-Coupled Logic ( FSCL ). The main feature of this logic family is
the low overlap current spikes generated during the switching transitions. Test structures
like ring oscillators ( using 2 input NAND gates ) and cascade of D-Latches have been
fabricated using the 2 gm p-well CMOS process to verify the functionality and characterize
the performance of these gates.
Further, a decimation filter has been designed using the novel Multi-Rate systolic arrays. A
2's complement 4 bit/4 bit multiplier-adder, which can be easily configured as a processing
element of the decimation filter, has been designed and implemented in FSCL. It has been
fabricated using the 2 gm p-well CMOS process to test it's functionality and performance.
The decimation filter designed using multi-rate systolic architecture and implemented in
FSCL can be used in mixed-mode applications like Sigma-Delta A/D converter to improve
it's performance characteristics at higher sampling rates.42
BIBLIOGRAPHY.
[1] J.T. Wallmark, "Noise Spikes in Digital VLSI Circuits," IEEE Transactions on
Electron Devices, vol. ED-29, pp. 451-458, March 1982.
[2] H.S. Lee, D.A. Hodges and P.R. Gray, "A Self-Calibrating 15 Bit CMOS A/D
Converter," IEEE Journal of Solid-State Circuits, vol.SC-19, no.6, pp. 813-819,
December 1984.
[3] M. Shoji, CMOS Digital Circuit Technology, Prentice-Hall, Englewood Cliffs, NJ,
1988.
[4] T. Tripp and B. Hall, "Good design methods for quiet high-speed CMOS noise
problems," EDN, pp. 229-236, October 1987.
[5] B.J. Hosticka and W. Brockherde, "The art of analog circuit design in a digital VLSI
world," IEEE Symposium in Circuit and System Proceeding, pp. 1347-1350, May 1990.
[6] S. Kiaei, S. Chee and D. Allstot, "CMOS Source-Coupled Logic for Mixed-Mode
VLSI," ICASSP, May 1990.
[7] S. Chee, "CMOS differential logic techniques for mixed-mode applications," Tech.
Rept, Dept. of Electrical and Computer Engineering, OSU, July 1990.
[8] C.S. Choy and P.L. Jones, "Minimization technique for series-gated emitter-coupled
logic," IEE Proceedings, vol.136, pt.G, no.3, June 1989.
[9] S. Kiaei and L. Alihua, "VLSI design of Multi-Rate Arrays for DSP algorithms,"
ICASSP, April 1990.
[10] Jack Kane, "A low-power, bipolar, 2's complement serial pipeline multiplier chip,"
IEEE Journal of Solid-State Circuits, pp. 669-675, October 1976.