VLSI partitioning of a 2-Gs/s digital spectrometer by Von Herzen, Brian
~ 
768 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 5, MAY 1991 
VLSI Partitioning of a 2-Gs/s Digital Spectrometer 
Brian Von Herzen 
Abstract -A digital correlating spectrometer is described for 
radio astronomy based on a custom GaAs digitizer and a custom 
micropipelined CMOS correlator. The digitizer quantizes at 2 
gigasamples per second (Gs/s) and 2-b resolution. A GaAs 
demultiplexer distributes the data into eight parallel streams of 
250 Ms/s each. The CMOS correlator operates at  250 Ms/s 
using 20 mW per correlator lag. The correlator output is pro- 
cessed on a host microcomputer to create a 1-GHz spectrum of 
the input signal that can be displayed interactively. An 8 X 9-mm 
chip is being developed in a 2-pm process that contains 320 
correlator lags. The design is partitioned into GaAs and CMOS 
components according to the required throughput at each stage 
of the system. The fastest signals (2 GHz) are kept on the chip 
level to minimize delay, crosstalk, system noise, and power 
consumption. Moderate-speed signals (250 MHz) are driven by 
GaAs components. CMOS components are used where high- 
speed outputs are not required. A strong synergy between the 
correlator architecture and micropipelined CMOS technology 
improves the performance by an order of magnitude compared 
to existing designs. Preliminary correlator chips have been built 
and tested at 250 Ms/s; final chips are under design. 
I. INTRODUCTION 
NE OF THE major challenges in developing a mi- 0 croelectronic VLSI system is to successfully parti- 
tion the design into separate components. Designs must 
satisfy simultaneous constraints on area, power, pins, 
throughput, and noise. This paper examines such a con- 
strained design: a digital correlating spectrometer opti- 
mized for radio astronomy. These spectrometers are at- 
tractive because of potentially low cost, small size, and 
high reliability relative to other spectrometers [1]. The 
correlating spectrometer is partitioned into GaAs and 
CMOS components to minimize high-speed signals on the 
circuit board and to eliminate high-speed outputs from 
the CMOS chips. The characteristics of micropipelined 
CMOS technology are reviewed, followed by a description 
of the spectrometer architecture. The CMOS correlator 
subsystem is examined in detail with a performance com- 
parison to existing correlators. 
11. MICROPIPELINED CMOS TECHNOLOGY 
The CMOS chips in the spectrometer use micropipelin- 
ing [2], which refers to small and simple pipeline stages 
placed in series. Each stage might have only one or two 
Manuscript received September 20, 1990; revised January 10, 1991. 
This work was supported by the National Science Foundation through 
Grant AST 891-2705 to the Caltech Submillimeter Observatory. 
The author is with Caltech Submillimeter Observatory, Hilo, HI 
96720. 
IEEE Log Number 9143193. 
Analog, 0-1 GHz 
2 bits @ 2 Gd 
correlator 
320 bits @ 2 Gs/ 
Accumulator 
32 biu @ d100 
Fourier 
Output 
Fig. 1. Digital correlating spectrometer 
logic gates between registers, maximizing the throughput 
of the circuit. These stages can use asynchronous or 
synchronous signaling [2]. 
Micropipelined CMOS circuits can accept input signals 
of several hundred megahertz if driven by external non- 
CMOS devices. Micropipelined systems can keep up with 
these data rates if the maximum latency between stages is 
only one or two gates [ 3 ] ,  and if nearest-neighbor commu- 
nication is used between functional blocks to minimize 
capacitance (no global wires driven by CMOS transistors). 
System latency does increase for micropipelined systems, 
but many signal processing systems such as the digital 
spectrometer depend much more on throughput than on 
latency. Output signals in CMOS are usually restricted to 
tens of megahertz, due to limited pin-driving capability. 
These qualities make micropipelined CMOS attractive in 
architectures requiring high input bandwidths and circuit 
density but with low output bandwidths. The correlator 
subsystem of the digital spectrometer matches these char- 
acteristics. 
111. SPECTROMETER A CHITECTURE 
A correlating spectrometer computes a power spectrum 
from the Fourier transform of the autocorrelation of a 
signal [4]. Fig. 1 shows a block diagram of the signal 
processing involved. A 1-GHz analog signal is digitized at 
2 b and 2 gigasamples per second (Gs/s). For weak 
astronomical signals, a 2-b quantizer is nearly as sensitive 
as an ideal quantizer with many bits. A 2-b quantizer 
provides 87% of the ideal sensitivity for 0-dB signals with 
Gaussian noise [51. 
001 8-9200/91/0500-0768$0l.00 0 1991 IEEE 
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 5 ,  MAY 1991 769 
XC 40-lag cross-correlam and accumulator. GI delayed signal 1 correlated with undelayed signal 3 
2 bits @ 2 Gds 
1:8 Demux I dismbutor (2-bit datapath) 
xc xc xc xc xc xc xc 
4 0  4 1  4 2  4 3  4 4  4 5  4 6  
0 1 2 3 4 5 6 7 Signalnumber 
Fig. 2. Correlator architecture. 
The correlator takes the digitized signal s ( t )  and com- 
putes the autocorrelation ( + ( T I :  
U ( T )  = Cs(r) s ( t  - T ) .  
Each discrete value of T corresponds to a single correla- 
tor ‘‘lag.’’ A correlator with 320 lags will produce a power 
spectrum with 320 channels, neglecting windowing effects. 
Ripple counters accumulate the outputs of the correlator. 
Since the integration time is long, the bandwidth out of 
the counters is reduced to a few hundred hertz. If the 
correlator and accumulator are integrated onto a single 
chip, the high-frequency outputs are eliminated. This 
integration permits the CMOS chip to run at 250 MHz 
without having to generate output signals at 250 MHz. 
A Fourier transform converts the accumulated autocor- 
relation function into a power spectrum. This computa- 
tion can be done in software since the correlator produces 
320 data points every few seconds. 
t 
IV. DESCRIPTION OF THE CORRELATOR 
AND ACCUMULATOR 
Fig. 2 shows the correlator architecture in greater de- 
tail. The CMOS correlator chips cannot operate at the 
full 2-Gs/s bandwidth so the data are split into eight 
parallel streams and each data stream is cross-correlated 
with all the others. Since there are eight data streams, we 
need 64 cross-correlators to correlate all the data. 
The correlator system is partitioned into eight correla- 
tor chips, each chip containing eight 40-lag cross-correla- 
tors. The host computer reconstructs the autocorrelation 
by adding together lags of equal time offsets. In Fig. 2 the 
data paths are numbered 0-7 and the correlator elements 
are numbered 00 through 07, 10 through 17, . . . , and 70 
through 77 for each pair of inputs. Correlators 70 and 07 
are not redundant because the inputs are not symmetric. 
Correlator 00 computes lags with time offsets 0,8,16, . . . , 
and 312. Correlator 07 computes lags 7,15,23, . . . , and 
319. Correlator 70 computes lags -7,1,9,17, * . . , and 
305. Correlator 77 computes lags 0,8,16, . . * , and 312. 
Note that correlators 00 and 77 compute the same time 
offsets using different data. These lags are added together 
on the host computer to reconstruct the original autocor- 
relation function. In this way, all lags from 0 through 312 
are computed, with some redundant lags from -7 to - 1 
and some extra lags from 313 to 319. 
Each cross-correlator has inputs for delayed and unde- 
layed data (Fig. 3). The 2-‘ symbol refers to unit delay 
elements, which retard the data by one sample for each 
correlator lag. A simple dynamic shift register suffices for 
the delay elements. Each unit delay uses a single pipeline 
stage. 
For each correlator lag, the delayed sample is multi- 
plied by the undelayed sample using a 2-bX2-b multi- 
plier. The product is offset and rounded to produce a 
positive 3-b result with < 1% sensitivity degradation [51. 
The product is accumulated in a 4-b adder, and the carry 
output is fed into a ripple-counter chain that integrates 
the correlation function. A 32-b counter requires 16 s to 
overflow in the worst case at 250 Ms/s. Longer integra- 
tions can be summed on the host computer. After the 
integration period, the counters are frozen and the results 
are sent to the host. The output interface of the correla- 
tor chip looks like a standard byte-wide ROM. 
V. MAPPING THE CORRELATOR INTO MICROPIPELINED 
CMOS TECHNOLOGY 
The functional blocks of the correlator and accumula- 
tor match extremely well to the capabilities of mi- 
cropipelined CMOS. High density and throughput are 
critical for the correlator circuits, and for the first few 
stages of the accumulator. The output of the accumulator 
is very slow, and this matches the slower output rates for 
CMOS devices. As a result, the CMOS correlator runs at 
ECL speeds of 250 Ms/s yet still has the density of 
CMOS. 
Power dissipation requires special attention for these 
chips. Each 320-lag correlator chip is expected to produce 
over 6 W of heat based on test chips and SPICE simula- 
tions. Aluminum heat sinks are needed on the ceramic 
PGA packages to dissipate the heat. 
Fig. 4 shows a single pipelined correlator lag. The 3-b 
product (MSB, MID, LSB) is computed in the first two 
pipeline stages, followed by a 4-b accumulation. Note that 
there are at most two logic gates between pipeline stages, 
which results in a high throughput and latency for the 
correlator. This approach can double the area of a logic 
circuit, but more than doubles the speed of the circuit 
compared to a combinatorial approach. The maximum 
delay goes from nine gates to two gates as the mi- 
cropipeline registers are placed in the circuit. 
The gates use complementary pass-transistor logic [61, 
which offers fast XOR gates with the same area as NAND 
and NOR gates. The data path of the correlator uses XOR 
gates extensively, so complementary pass-transistor logic 
is particularly attractive. 
770 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 5 ,  MAY 1991 
2 bits @ 250 Msls 
Delayed data 
Fig. 3. CMOS correlator chip structure. 
Delayed data 
Undelayed data 
Full adder 
Half adder g-2 ounlers 
Fig. 4. Logic for a single correlator lag. MSB, MID, and LSB refer to 
the 3-b product. S O - S 3  hold the 4-b sum. C4 is the carry output from 
the correlator adder. At most, two logic gates lie between each register 
stage. 
VI. CMOS CORRELATOR FLOOR PLAN 
A critical aspect of the CMOS layout is to minimize the 
parasitic loading on internal nodes of the correlator chip. 
Fortunately, the correlator can be laid out exclusively 
with nearest-neighbor communications between cells. All 
global clocks and input signals are driven from external 
GaAs FET's or bipolar devices. This approach eliminates 
large parasitic capacitances caused by internal global 
wires. 
Fig. 3 shows the basic layout structure of the correlator. 
The nearest-neighbor communication minimizes the RC 
time constants associated with the signal wires. The unde- 
layed data lines are driven externally. The layout of the 
correlator chip is shown in Fig. 5. Twelve lags of the 
correlator fit onto a single tiny chip fabricated through 
the MOSIS prototype fabrication service [7]. The chip 
Slow 
output 
Lo 
Host 
:Olllp"lU 
Fig. 5. CMOS correlator with 12 lags and 16-b accumulators on a 
MOSIS tiny chip. 
measures 2.2 mm on each side. Roughly 320 correlator 
lags fit onto a chip measuring 8 x 9  mm. 
VII. GaAs DIGITIZER 
The fastest part of the spectrometer is the A/D subsys- 
tem, which consists of a gallium-arsenide flash converter 
coupled to a 1 : 8 demultiplexer/distributor for the CMOS 
correlator. The data path from the A/D must run at 2 
Gs/s. The distributor reduces the bandwidth to 250 Ms/s 
in eight parallel streams and drives the data to the corre- 
lator. The A/D and the distributor are combined on one 
chip to eliminate the need to run 2-Gs/s signals on the 
circuit board (Fig. 6). The system benefits from less noise, 
crosstalk, and power consumption as a result. 
A possible risk in putting the distributor on the same 
chip as the A/D is that the output signals may couple 
back into the analog input and distort the resulting spec- 
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 5, MAY 1991 771 
Threshold ~1 
voltages 
Flash 
converter 
La- 3:2 encoder 
LSB 
Parallcl 
output 
S B  
MSB 
Parallel 
OUtpUl 
Fig. 6. Prototype GaAs 2-GHz 2-b A/D converter with a 1 : 8 demultiplexer/distributor. 
host interface correlator chips 
Fig. 7. Floor plan for the correlating spectrometer system on a PC-compatible circuit board. 
trum in a data-dependent manner. This risk is reduced by 
separating analog power and ground from digital power 
and ground and by placing the analog and digital sections 
on opposite sides of the chip. 
Correlating spectrometers for astronomy have histori- 
cally required a full rack of equipment, and have been 
marginally reliable due to the large number of chips used. 
Our target implementation fits onto a single card for a 
PC-compatible computer, using only ten chips plus the 
host interface. For astronomical applications this repre- 
sents a major breakthrough in cost and simplicity. VIII. SYSTEM LAYOUT 
Fig. 7 shows the layout for the digital spectrometer 
circuit board. The eight PGA's on the right half of the 
board hold the CMOS correlator chips. The two chips in 
the middle of the board represent the GaAs A/D and 
demultiplexing circuitry. The connector end of the board 
holds the bus interface circuits. The signal and spectrom- 
eter clock are driven externally. 
IX. PROJECT STATUS 
The GaAs chips have been designed, built, and are 
currently undergoing testing. Preliminary CMOS test chips 
consisting of micropipelined delay lines and 2 X 2-b multi- 
plier circuits have been fabricated and successfully tested 
712 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 5 ,  MAY 1991 
at data rates of 250 Ms/s, confirming the simulations 
obtained with Berkeley SPICE 3cl. The small correlator 
chip of Fig. 5 has been simulated, fabricated with a 2-pm 
CMOS n-well process, and successfully tested at 50 Ms/s. 
A new version of the chip with resized transistors is 
expected to operate at 250 Ms/s. This chip is under 
fabrication; also, a large chip with eight cross-correlators 
is being designed. 
to run at ECL speeds. Micropipelined CMOS can dramat- 
ically boost performance when the output data rate is 
reduced compared to the processing data rate. 
ACKNOWLEDGMENT 
The author wishes to thank Prof. W. J. Dally at M.I.T. 
and Prof. T. G. Phillips at Caltech for reviewing the 
manuscript. 
X. PERFORMANCE IMPROVEMENT 
An important figure of merit for correlating digital 
spectrometers is to count lags per chip times the maxi- 
mum data rate of the chip, expressed in lags per second 
per chip. Existing designs using ECL gate arrays have 
REFERENCES 
[ l ]  J. Wilson and K. Chandra, “Spectrometers for space microwave 
radiometers,” Jet Propulsion Lab., Pasadena, CA, Rep., Mar. 1990. 
[2] I. E. Sutherland, “Micropiplines,” Commun. ACM, vol. 32, no. 6, 
pp. 720-738, June 1989. 
IEEE J .  Solid-state Circuits, vol. 24, no. 1, pp. 62-70, Feb. 1989. 
achieved and 4*4 lags per second per chip, [3] J, Yuan and C. Svensson, “High-speed CMOS circuit technique,” 
not including the accumulating counters [I]. h a l o g  CCD 
correlators have achieved 5 billion lags per second per 
chip [SI without the accumulators. The micropipelined 
CMOS correlator described in this paper is expected to 
achieve 80 billion lags per second per chip based on 320 
lags and 250 Ms/s. This estimate is conservative since it 
includes the area required for the accumulating counters. 
The order-of-magnitude improvement in lags per sec- 
ond per chip stems from the high throughput and density 
available with micropipelined CMOS. The output band- 
width is reduced by integrating the correlators and the 
accumulators, which permits the high-density CMOS chips 
[4] B. F. C. Cooper, “Auto-correlation spectrometers,” in Methods of 
Experimental Physics, vol. 12, Part B :  Astrophysics and Radio Tele- 
scopes. 
[5] B. F. C. Cooper, “Correlators with two-bit quantisation,” Australian 
[6] K. Yano et al., “A 3.8-11s CMOS 16X 16-b multiplier using comple- 
mentary pass-transistor logic,” IEEE J .  Solid-State Circuits, vol. 25, 
no. 2, pp. 388-395, Apr. 1990. 
[7] MOSIS User Manual, Inform. Sci. Inst., Univ. of Southern California, 
Marina del Rey, CA, 1988. 
[8] S. C. Munroe, D. R. Arsenault, K. E. Thompson, and A. L. Lattes, 
“Programmable, four-channel, 128-sample, 40-Ms/s analog-ternary 
correlator,” IEEE J .  Solid-state Circuits, vol. 25, no. 2, pp. 425-429, 
Apr. 1990. 
New York: Academic, 1976, p. 280. 
J .  Phys., vol. 23, pp. 521-527, 1970. 
A Silicon Model of an Auditory Neural Representation 
of Spectral Shape 
John Lazzaro 
Abstract -The paper describes an analog integrated circuit 
that implements an auditory neural representation of spectral 
shape. The circuit contains silicon models of the cochlea, inner 
hair cells, spiral ganglion cells, and the neurons that compute 
an amplitude-invariant representation of spectral shape. The 
chip uses the temporal information in each silicon auditory-nerve 
fiber to compute this final representation. The chip was fabri- 
cated and fully tested; the paper includes data comparing the 
silicon auditory-nerve representation and the final representa- 
Manuscript received September 20, 1990; revised January 10, 1991. 
This work was supported by the Office of Naval Research and the 
Defense Advanced Research Projects Agency. 
The author was with the California Institute of Technology, Pasadena, 
CA 91125. He is now with the University of Colorado, Boulder, CO 
80309-0425. 
IEEE Log Number 9143199. 
tion. The 9000-transistor chip computes all outputs in real time 
using analog continuous-time processing. 
1. INTRODUCTION 
HE COCHLEA is the sense organ of hearing. It T converts acoustic signals into the first neural repre- 
sentation of audition; the auditory nerve carries this rep- 
resentation to the brain. Outputs from the left and right 
cochleas serve as inputs for the neural structures that 
perform spatial sound localization and sound understand- 
ing. In addition, several species of animals use their 
cochleas as sensors for active sonar processing. 
Sound recognition, sound localization, and active sonar 
are practical and interesting engineering endeavors. There 
is renewed interest by the engineering community in 
0018-9200/91/0500-0772$01.00 01991 IEEE 
