Multirate as a hardware paradigm by Stevens, Kenneth & Suter, Bruce W.
MULTIRATE AS A HARDWARE PARADIGM 
B. W. Suter 
Air Force Res. LablIFGC 
525 Brooks Rd 
Rome NY 13441 
K.. S. Stevens 
Intel 
Strategic CAD Labs 
Portland, OR 97124 
ABSTRACT 
Architecture and circuit design are the two most effective 
means of reducing power in CMOS VLSI. Mathematical 
manipulations, based on applying ideas from multirate sig-
nal processing have been applied to create high performance" 
low power architectures. To illustrate this approach, two 
case studies are presented - one concerns the design of a 
fast Fourier transforms(FFT) device, while the other one is 
concerned with the design of analog-to-digital converters. 
1. INTRODUCTION 
While performance remains a primary figure of merit for 
any CMOS design, the increase in transistor count and smaller 
feature size of each process generation is elevating the im-
portance of power, skew, increasing process variations, and 
increased capacitance of non-local communication. As de-
sign sizes increase, the ability to view a die as a unifi~ cir-
cuit controlled by a single frequency becomes less VIable. 
We feel that future architectures will be modular where each 
section contains its own frequency domain, and where they 
communicate not on bidirectional shared busses but via point-
to-point unidirectional communication links. Such a design 
has significant power and potential performance advantages 
when compared to "traditional" architectures. 
We decided to focus on new design approaches in order 
to investigate bringing formal mathematics of signal pr0-
cessing to bear on new design realities. A fast Fourier trans-
form (FFT) architecture, which is efficient in terms of per-
formance and power, has been created as a first case study. 
Then, to show the generality of this approach, an analog-
to-digital converter design is considered as a second case 
study. 
2. CASE STUDY: FAST FOURIER TRANSFORM 
The following SectiOD describes a successful case study that 
was undertaken to utilize concepts from multirate signal 
processing and asynchronous circuit design to achieve a low 
power, high performance design. The fast Fourier transform 
S. R. Velazquez 
V Company 
388 Ocean Ave, Ste 1613 
Revere Beach, MA 02151 
T. Nguyen 
Boston University 
Elec. & Compo Eng. 
Boston, MA 02215 
was chosen since it is an algorithm that requires globally 
shared results. 
Our mathematical approach is hierarchically formed and 
expressed in terms of the W N = exp( - j ~) notation as 
shown in Equation I. The derivation can be found in [7]. 
X m,(m2) = L WN',n2 L xn2 (nl)w;"n, W;'2"2 N2-1 [N'_l ] 
n2=O ",=0 
(I) 
This notation represents N2 FFTs using Nl values as the 
inner summation, which are scaled and then used to produce 
Nl FFTs of N2 values. The total operation achieves the 
desired FFT of size N. 
Historically, equations for FFT systems similar to our 
approach have been developed for two applications. In the 
mid 1960's the problem of computing the FFT of a vector 
that was too large to fit in main memory was addressed. 
An approach similar to that presented here was created to 
limit the storage requirements in these primitive systems[2]. 
A second similar approach was achieved in the 1980's for 
mUltiprocessor applications of the FFT algorithm[l]. The 
underlying architectures created from these equations are 
vastly different than that achieved here. 
The goals of this case study were to attempt to take a 
common application area and investigate novel formal ar-
chitectural approaches to architect low power and high per-
formance with additional constraints on what we project 
future designs will require. We therefore emphasized in 
our formulation pipelining, increasing localization, hierar-
chy, and establishing multiple frequency domains where we 
attempt to push the critical path into concurrent lower fre-
quency domains to support high performance. 
The multiplicative complexity of our approach is the 
same as the conventional Cooley-Tukey FFT formulation, 
which is O(NlogN). But, our approach permits localized 
computations, as opposed to globally computing butterflies. 
This in tum suggests a low power silicon implementation, 
which is shown in Figure I. 
The multirate formulation of this algorithm has resulted 
in an implementation parallelized in a pipelined fashion. 
0-7803-5041-3/99 $10.00 © 1999 IEEE 1885 









Figure I: Low Power FFT Architecture 
Each "row" in the architecture contains point-to-point uni-
directional data pipelines. The entire design is implemented 
using asynchronous finite state machines for control. 
The frequency of each horizontal track in the architec-
ture operates at J. the frequency of the initial sample rate 
due to decimation - an average cycle time of 160ns using 
a IOns sample rate. The asynchronous design methodology 
allows the rate division to occur locally with much of the 
circuit idle consuming only leakage current when the oper-
ation is complete. 
The down arrow blocks of Figure I are decimators[6]. 
The output of the M-fold decimator for a sampled signal 
X(n) is given by y(n) = x(Mn). This is effectively a 
demux operation where each output is selected in order. 
Each of the Nl and N2 blocks represent another FFT 
operation which can be a hierarchical instantiation of the 
structure in the figure where the values of Nl x N2 equals 
Nl or N2 at the higher level in the hierarchy. 
The product blocks multiply a stream of results coming 
from the Nl point FFT units by a set of constant values. 
Both constants and results are complex numbers, requiring 
four multiplications and two additions per sample. The con-
stants are calculated by WNt n., where ml = 0, ... ,Nl -1 
and n2 = 0, ... ,N2 - 1. 
The large pipeline switch maps results from the product 
block to the N2 FFT units. The N2 FFT units take a trans-
form of time displaced Fourier transform samples. Each 
Nl-point FFT provides one data sample to each of the N2-
point FFf units, the first row providing the first sample. 
1886 
Chip Power per transform 
DSP-24 (DSP Arch.) 1431-'11 transform 
SPIFFEE-l (Stanford) 501-'11 transform 
spaceFFf 971-'11 transform 
earthFFf 181-'11 transform 
Table 1: FFT Power Comparison 
Chip Throughput 
DSP-24 (DSP Arch.) 48k transforms/sec 
SPIFFEE-l (Stanford) 33k transforms/sec 
Our design 25,OOOk transforms/sec 
Table 2: FFT Perfonnance Comparison 
A stream of data xO(m2), ... , XN-l (m2) is output by 
the N2 FFT units to an array of expanders[6]. The output 
frequency of the expanders increases Nl fold, with each ex-
pander cell providing a single data sample. 
This case study illustrates potential benefits of a matho-
matical approach to design for low power and implemented 
using self-timed methodologies. We have designed and sub-
mitted to MOSIS a circuit containing the FFT -4 logic using 
a radiation tolerant cell library in O.81'm CMOS. The power 
consumption of the fabricated FFT -4 and completed implo-
mentation of the FFT-16 has been used to estimate the over-
all power efficiency of a 1024-point FFT. These results are 
shown in comparison with other FFT designs in Table I. 
In addition to these reductions in power consumption, we 
achieved a remarkably high sustained throughput as can be 
observed in Table 2. All designs are measured using a 3.3V 
voltage source. 
Figure 2 projects out a single "row" of the architecture 
presented in Figure I. This shows the data path from input 
sample to output in a 2S6-pointFFT (N1 ::::: N2 = 16). Note 
that the data path switches between frequency domains of 
I OOMHz, 6.26MHz. and I.S625MHz. The higher frequency 
domains - such as input decimation and output expansion -
are zones of reduced concurrency whereas the more concur-
rent operations function at decreased frequency. The ultra 
low frequency of the bulk of the circuit permits compact low 
energy circuit implementations, such as rippl~carry adders 
rather than look-ahead or array type adders, for the summa-
tion and product blocks. 
I lOps I I§OnS I FfT+ MOnS 
f--------.-.----------------.----,--------- . 
o 0 0 
o 0 
Figure 2: Data Path For Low Power FFT Architecture 
3. CASE STUDY: ANALOG-TO-DIGITAL 
CONVERTER 
The Advanced Filter BatIk Analog-to-Digital Converter (AFB 
ADC) is a breakthrough approach to very high-speed, high-
resolution analog-to-digital conversion which improves the 
speed of conversion by up to six times the stat~f-the-art 
by. using a parallel array of converters. The AFB ADC uses 
multirate filter bank signal processing to improve perfor-
mance by using a combination of frequency-division and 
tim~ivision multiplexing. Because of its unique archi-
tecture, the AFB ADC is continuously upgradeable as new 
analog-to-digital converter chips become available, thereby 
maintaining its performance advantage. The architecture 
can provide parallel channel outputs at lower data rates to 
ease the processing requirements of the digitized samples. 
The architecture is amenable to a singl~chip VLSI impl~ 
mentation to reduce size and power consumption. To further 
reduce power consumption, filters and channel ADCs can be 
implemented in low-power charg~domain processing[ 4]. We 
have built a 12-bit, 80 MSa/s hardware prototype for p0-
tential use in a universal, software-reconfigurableradio fre-
quency (RF) receiver for cellular, satellite, or military com-
munications. A l4-bit, 325 MSa/s system is currently being 
developed. 
Using a filter batIk for analog-to-digital conversion is 
an unconventional application of the filter bank architec-
ture that improves the speed and resolution of the conver-
sion over the standard Tim~lnterleaved array conversion 
technique[3]. The term "Advanced Filter Bank" denotes a 
filter batIk with both analog and digital filters; conventional 
filter banks employ only discrete-time, digital filters. The 
AFB ADC employs analog Decomposition filters/splitters, 
Dk. to split the wideband analog input signal into M chan-
nel signals. The channel signals are sampled at 1/ M the 
effective sample rate of the system and converted to digital 
signals with n-bit ADCs. The digitized channel signals are 
upsampled by M and reconstructed via the digital Recom-
bination filters, Rk. The effective sample rate of the system 
is M times that of the channel ADCs in the array, and the 
resolution is n bits, the same as that of the channel ADCs 
in the array. The Advanced Filter Bank architecture is illus-
trated in Figure 3. 
The AFB significantly improves the speed and resolu-
tion of the conversion by attenuating the effects of gain and 
phase mismatches between the converters in the array, which 
otherwise severely limit the resolution of the system[ 5]. The 
AFB is expected to provide analog-to-digital conversion with 
resolution up to 14 bits at sample rates up to 325 MSals, as 
shown in Figure 4. 
The goal in the design of AFBs is to design filters that 
approximate the perfect reconstruction conditions as closely 
as possible: distortion should be small (e.g.,less than a tenth 
of a dB deviation from ideal 0 dB) and aliasing error should 
not limit the resolution of the system (e.g., 85-90 dB for 
a 14 bit system). Given the Decomposition filters/splitters, 
Dk(s), these perfect reconstruction constraints can be solved 
for the frequency response of the ideal Recombination fil-
ters, Rk(z). An efficient Recombination filter design algo-
rithm based upon the Fast Fourier Transform is developed 
in [8].· A discret~time-to-continuous-time ("Z-to-S") trans-
form which converts a perfect reconstruction (PR) discret~ 
time filter batIk into a near-perfect reconstruction AFB is 
also developed in [8]. 
The technical efficacy of the AFB ADC has been proven 
with a two-channel board-level prototype. The prototype 
provides full 12-bit performance at a sample rate of 80 MSa/s, 
twice as fast as the s~f-th~art Analog Devices AD9042 
converter ICs upon which the prototype is based. The mea-
1887 
Figure 3: Advanc:ed Filter Bank Analog-to Digital Con-
verter Architecture 
Resolution Speed SFDR SNR 
[bits] [MSals] typ/min [dB] typ/min [dB] 
12 80 82174 • 68/62 
Table 3: Measured Performance of AFB ADC Prototype 
sured performance of the 12-bit AFB ADC prototype is shown 
in Table 3. The prototype is a two-channel architecture that 
employs low order analog filters/splitters and two length 32 
digital FIR filters. The board-level prototype measures 10 
em by 13 em. Note that the prototype can easily be up-
graded to provide 12-bit performance at sample rates from 
130 MSals to 260 MSals by incorporating two to four Ana-
log Devices AD6640 12-bit, 65 MSals converter ICs. A 
14-bit, 325 MSals system with dynamic range greater than 
85 dB based upon the Lucent Technologies 65 MSals, 14-
bit ADC integrated circuit (CSPI152A) is currently being 
developed. 
4. CONCLUSIONS 
Two examples of powerful multirate signal processing based 
architectures are discussed. The significantly decreased power 
consumption and dramatically increased throughput are the 
result of greater locality and increased parallelism. 
- ... 
Figure 4: ADC designers face challenge of trading off the 
resolution(in bits) of its conversion with its speed (in sam-
ples per second). The thick line indicates the state-of-art in 
single chip ADCs. 
S. REFERENCES 
[I] D. Bailey. FFTs in External or Hierarchical Memory. 
Journal o/Supercomputing, 4:23-35, 1990. 
[2] W. Gentleman and G. Sande. Fast Fourier Transforms 
for Fun and Profit In AFIPS Coriference Proceedings, 
volume 29, pages 563-578,1966. 
[3] A. Petraglia and S. K. Mitra. Analysis of Mismatch 
Effects Among AID Converters in a Time-Interleaved 
Waveform Digitizer. IEEE Trans. on Inst. and Meas., 
40:831-835,1991. 
[4] S. Paul. Analysis, design, and implementation of 
charge-to-digital converters. Technical report, Mas-
sachusetts Institute of Technology, Cambridge, MA, 
Master's Thesis, 1995. 
[5] S. R Velazquez. Hybrid filter banks for analog/digital 
conversion. Technical report, Massachusetts Institute of 
Technology, Cambridge, MA, PhD Dissertation, 1997. 
[6] Bruce W. Suter. Multirate and Wavelet Signal Process-
ing. Academic Press, 1997. 
[7] Bruce W. Suter and Kenneth S. Stevens. Low Power, 
High Performance FFI' Design. In Proceedings 0/ 
IMACS World Congress on Scientific Computation. 
Modeling. and Applied Mathematics, pages 99-104, 
1997. 
[8] S. Velazquez, T. Nguyen, and S. Broadstone. Design 
of Hybrid Filter Banks for AnaloglDigital Conversion. 
IEEE Trans. on Signal ProcesSing, 46:956-967, 1998. 
1888 
