A high speed CMOS correlator by Whitaker, S. & Canaris, J.
N94-71109
2nd NASA SERC Symposium on VLSI Design 1990 3.3.1
A High Speed CMOS Correlator
J. Canards and S. Whitaker
NASA Space Engineering Research Center
for VLSI System Design
University of Idaho
Moscow, Idaho 83843
Abstract - A full custom, 25 MHz, 1.6/zro CMOS Correlator chip is presented.
The 5.15mm by 4.23mm chip performs either autocorrelation or crosscorrelation,
consuming less than lOmW per channel. The correlator, designed for a space
borne spectrometer, contains 32 channels. The 24 bit accumulator registers
can be read independent of the input data path, in either 8 bit bytes, or 16 bit
words. The device is cascadable and allows integration periods of up to 1.78
seconds, at 25 Megasamples/second. The controllers, for the input data path
and the data output section, are implemented with Sequence Invariant State
Machines.
1 Introduction
A high speed, low power CMOS correlator chip is presented in this paper. The correlator,
designed for a space borne spectrometer contains 32 time-lag channels, each of which
contains a biasing multiplier, a 4 bit accumulator and a 24 bit counter. The sensing
instruments provide the chip with two 2 bit input words, which can be either the same
signal, for autocorrelation, or different signals, for crosscorrelation. The biasing multiplier
does not perform binary multiplication, but implements a special function described in
Section 4. External control of the correlator is quite simple, requiring only a reset pin and
a pin to signal the end of an integration period. Simple handshaking is provided through
a single output pin, which signals when data is available to be read from the output port.
Output data can be read while integration is in progress, in either 8 bit bytes or 16 bit
words, under the control of a user provided strobe. The correlator is capable of maintaining
a 25 Megasample/second input data rate, with integration periods of up to 1.78 seconds.
Data can be output from the chip at 10 MHz. The chip consumes less than lOmW/channel
of average power. Auxiliary ports are provided for both of the data inputs.
The data path of this chip is extremely regular, the initial layout of the core required
only 30 hours to complete. The controllers for the input data path and the data output
section are implemented with Sequence Invariant State Machines [1], and were initially
layed out with a compiler described in [2]. This chip is amongst the first VLSI designs to
utilize Sequence Invariant State Machines.
Some of the main features of the correlator chip are listed below.
https://ntrs.nasa.gov/search.jsp?R=19940004354 2020-06-17T00:18:13+00:00Z
3.3.2
• Autocorrelation or Crosscorrelation
• 25 Megasamples/second
• 32 channels
• Up to 1.78 second integration time at 25MHz
• Low Power (< IQmW per channel)
• Cascadable
• Selectable auxiliary ports on the data inputs
• Integration can continue while data is output
• CMOS and TTL compatible inputs
2 General Description
The correlator chip accepts two 2 bit data streams clocked at a maximum rate of 25MHz.
Delayed versions of one stream are multiplied with the current data of the other stream.
Products for each delay (channel) are accumulated and the accumulator overflows are
counted. This procedure continues for one integration period, as defined by a control line
(INT) held low. Integration is performed continuously until INT is strobed high. At this
time the overflow counters from each channel are isolated from their respective accumula-
tors. After the counters have settled DATARDY going high signals that data is available
for output. The overflow counters are cleared and are reconnected to the accumulators
and a new integration period begins at this time. The contents of the overflow counters
are output, under user control, through a 24 bit wide shift register after DATARDY goes
high. Data output is either word serial or byte serial, under user control. When word serial
mode is selected only the 16 most significant bits of each channel are output. DATARDY
will remain high until all of the 32 output registers have been read, regardless of which
output mode is selected. A test mode is provided to decrease the time required to test the
onboard overflow counters.
3 Functional Description
3.1 Initialization
The chip must be powered up with RN held low for at least 10 clock cycles, while OUTCK
and INT are held low. This will bring the chip into a sanity state while guaranteeing that
the output pads will be tristated. During this time the overflow counters will be cleared
and the control state machine will be prepared for normal operation. Integration will begin
on the clock following RN being brought and held high. The delay path shift register and
2nd NASA SERC Symposium on VLSI Design 1990 3.3.3
AUXA A AUXB B AUXINB
1 i I 1
Mux Mux
AUXINA t
De
1
De
1
<
<
<
1
De
1
De
lay
lay
,
f I
i (
i <
» i
1
lay
Delay
AOUT RN
t 1
lay
i*
> „ Biasing Counter/
_ Mult. - Add -Accum p Shift Register
1 1
1
, „ Biasing Counter/
Mult. - Add -Accum p Shift Register
1
 1 11
i •
» •
i •
1
1 1
, „ Biasing Counter/
Mult. - Add -Accum p Shift Register
1 1
* i^ on iroi ^ IVILLX
1 1 1 OUTCK BYTE 1 1
CLOCK INT DATARDY DOO-D015
Figure 1: Functional Block Diagram
3.3.4
the channel accumulators can never be cleared. Figure 1 shows a block diagram of the
correlator chip.
3.2 Data Input
Data will be input to the chip on the A and B data buses (AO,A1 BO,B1) every clock cycle.
Although data will still be clocked into the chip, processing will not occur between INT
being strobed high and DATARDY going high. Pins AO and Al are the least significant
and most significant, respectively, bits of the delay line. BO and Bl are the least significant
and most significant, respectively, bits of the undelayed signal.
Additionally, two auxiliary input ports (AUXAO,AUXA1 AUXBO,AUXB1) are pro-
vided. These ports are multiplexed with the A and B buses, respectively, under the con-
trol of the AUXINA and AUXINB pins. When AUXINA is held high, the A bus becomes
the input port to the chip, when AUXINA is held low the AUXA bus becomes the input
port to the chip. When AUXINB is held high, the B bus becomes the input port to the
chip, when AUXINB is held low the AUXB bus becomes the input port to the chip. The
auxiliary input ports behave identically to the primary ports.
3.3 Correlation
Correlation begins on the clock following RN being brought and held high or when INT
is held low and DATARDY goes high (signaling that data is ready from the previous
integration period). At that time each channel will multiply the data on the B bus with
the output from it's respective delay element. The product will be accumulated with the
previous sum for that channel. Any overflow from the accumulator will be counted in the
overflow counter of that channel. This process will continue until the INT signal is strobed
high for at least 1 clock cycle.
The multiplications are not purely binary in nature. The output of the multiplier is
biased in a manner described in Section 4. The accumulator contains a four bit adder and
four bit register. The carry out of the adder is the toggle signal into the the overflow ripple
counter. The overflow counters are 24 bits wide, allowing for the count of up to 224 — 1
overflows. The frequency of overflow is a function of sample frequency and the length of
the integration period.
3.4 Data Output
At the end of the integration period, signaled by INT being strobed high for at least 1
clock cycle, the overflow counters will be isolated from the processing elements. After the
overflow counters have settled (10 clock periods, maximum) the contents of the counters
will be dumped into an output shift register and the chip will signal that data is ready
(DATARDY) on the output bus. When DATARDY goes high the first data can be read
from the output bus on the next rising edge of OUTCK. When DATARDY goes high a
2nd NASA SERC Symposium on VLSI Design 1990 3.3.5
new integration period begins. DATARDY will remain high until all 32 output registers
have been read, regardless of which output mode is selected.
At all times, after the clock starts, data will be output from the end of the delay line
at pins AOUTO and AOUT1. AOUTO is the delayed version of AO and AOUT1 is the
delayed version of Al. These pins may be used in a cascaded system by connecting them
to the AO and Al pins of the next chip downstream.
Data output is, optionally, word serial or byte serial. Holding the output control signal
(BYTE) high during the output phase will output the 24 bits of the counters in 8 bit bytes,
most significant byte first. Pins DOO-DO7 will be used (DOO being the least significant bit).
Pins DO8-D015 will be tristated. Holding BYTE low will cause the 16 most significant
bits of the counter to appear on pins DOO-DO15.
The output pins DOO-D015 will be tristated whenever DATARDY is low or when
DATARDY is high and OUTCK is low.
3.4.1 Word Serial Mode
When BYTE is held low (word serial mode), successive output words will be clocked out
by the rising edge of OUTCK. During the low half of the OUTCK cycle the output bus
will be tristated. OUTCK has a minimum frequency of QHz and a maximum frequency of
10MHz. OUTCK periods do not have to be of equal length and the duty cycle need not
be 50%, but a minimum pulse width of 44ns is required.
During the output phase the output data shift register will be cleared. The output
clock must be strobed 32 times to unload the output shift registers. Data on the A and B
buses continues to be input to the chip during the output phase.
Data output is terminated by bringing and holding OUTCK low after reading out all
32 channels.
3.4.2 Byte Serial Mode
When BYTE is held high (byte serial mode), successive output bytes will be clocked out
by the rising edge of OUTCK. During the low half of the OUTCK cycle the output bus
will be tristated. OUTCK has a minimum frequency of QHz and a maximum frequency of
\OMHz. OUTCK periods do not have to be of equal length and the duty cycle need not
be 50%, but a minimum pulse width of 44ns is required.
During the output phase the output data shift register will be cleared. The output
clock must be strobed 96 times to unload all of the output registers. This option allows
access to the 8 least significant bits of the overflow counters, as well as providing an 8 bit
data bus.
At the end of the output phase the chip will be ready to begin a new correlation. Data
on the A and B buses continues to be input to the chip during the output phase.
Data output is terminated by bringing and holding OUTCK low after reading out all
32 channels.
3.3.6
3.5 Test Mode
Test mode provides a method for checking the operation of the 24 bit overflow counters.
Test mode is entered by bringing and holding TEST high, while in an integration period.
Test mode breaks the overflow counters into three 8 bit counters, the inputs to which
are the overflow bit of the adder in each channel. An appropriate input pattern must be
applied to the A and B buses during test mode operation. Access to the counters is the
same as during normal operation.
4 Biasing Multiplication
The multiplication to be performed takes two 2 bit input words and forms a 3 bit product.
The mapping of data is described in Figure 2. BO and Bl are the real-time inputs. AO
and Al are the delay line inputs. PO, PI and P2 are the product outputs.
Bl
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
BO
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
Al
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
AO
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
P2
0
1
0
0
1
1
0
0
0
0
0
1
0
0
1
1
PI
1
0
1
1
0
1
1
0
1
1
1
0
1
0
0
1
PO
1
0
1
0
0
0
0
0
1
0
1
0
0
0
0
0
Figure 2: Biasing Multiplication Truth Table
5 Data Path Design
Each of the 32 channels integrated on the correlator chip are identical. A single channel
consists of 2 delay elements for the time-lag input signal, one biasing multiplier, one 4 bit
adder with a 4 bit accumulator register and a 24 bit counter.
2nd NASA SERC Symposium on VLSI Design 1990 3.3.7
Destination
State
Codes i~~*
1
Input
Switch
Matrix
All
Next
States
i .I i
I
Next
State
Logic
y
D
FF
Figure 3: General block diagram.
The overflow counter, which stores correlator integration values is an asynchronous
ripple counter. The asynchronous design reduces the power requirements of the chip. The
biasing multiplier is implemented as n-transistor pass network [3,4,5,6], which yields a very
dense, regular, combinational logic network, while providing the operational speed required
by the 25 MHz clock frequency. The four bit adder is implemented as a Transmission
Gate Conditional Sum Adder [7,8]. This adder provides the performance needed while
minimizing both capacitive loads and silicon area required to implement this function.
In general, the use of pass transistor networks reduces nodal capacitance, which is an
important consideration in low power applications. Each channel also contains two pipe
registers, between the carry out of the adder and the input to the overflow counter. These
registers serve two purposes. First, the propagation path would be too long for a 40ns clock
period and second the registers provide a means for isolating the ripple counters from the
input data path. At the end of an integration period, the counters must be allowed to
settle, before output data is ready for reading. All registers are static.
6 Controller Design
The correlator chip requires two controllers. The input data path controller maintains the
state information required by the 32 channels. The input data path has two main states.
The first state is normal integration and the second is the preparation of integrated data
for output. The output controller provides the control of the output shift register and the
formatting of the data sent to the output port.
Both controllers were designed using Sequence Invariant State Machines [1]. The logic
of such state machines is invariant with respect to the actual sequence required. Only the
number of states and inputs need to be known to specify the logic.
A general block diagram of a Sequence Invariant State Machine is shown in Figure 3.
The logic that forms each next state equation, Yi, consists of a storage device (a D flip-flop),
next state excitation circuitry, a Binary Tree Structured (BTS) network, which generates
the next state values to the flip-flop, and input logic consisting of a pass transistor matrix.
Present state information is fed back to the next state logic and input information drives
the input switch matrix.
A general BTS network is employed to formulate the next state equations for sequential
circuits. This general BTS network represents a complete decoding of an input space and
3.3.8
hence only constants are input to the network. Any specific function can be realized by
simply changing the pass variable constants, 1(0), at the input to the appropriate branch.
The input matrix is programmed with appropriate connections to 1(0) to produce the
desired state transitions. Changing the sequence of operation merely requires a repro-
gramming of these connections. For the correlator state machines, the input switch matrix
was eliminated by applying the inputs, I, as pass variables to the BTS network. Work to
produce a general theory for this logic reduction is currently under way at the UI NASA
SERC.
6.1 Input Controller
During integration, the input state machine connects the 24 bit overflow counters to the
calculation section, while disconnecting the output shift register. At the end of an inte-
gration period, signaled by INT going active, the controller moves through a fixed set of
states. The state machine first isolates the overflow counters from the calculator. The
machine then steps through a number of states while the ripple counters settle. At that
time the counters are first parallel loaded into the output shift register and then reset. The
state machine then reconnects the counters to the calculator, beginning a new integration
period. This controller also sets a latch which signals that data is ready for reading.
The use of Sequence Invariant State Machines proved invaluable in this application.
At the time of initial logic design, it was unknown how long it would take for the ripple
counters to settle. It was desirable to minimize the length of time that the counters were
disconnected, as this time shortens the integration period. The state machine was initially
designed with a number of wait states which was deemed sufficient. After circuit design
was finished on the counters, several states could be removed. The redesign of the state
machine required about 10 minutes of engineering time and about 10 minutes of layout
time. This is a significant improvement over traditional state machine designs. The output
equations of the signals required to control the data path are formed by logical blocks which
are identical to those in the state machine itself, as described in [9]. Again as the number
of states changed, the output equations were also easily modified.
The input state machine requires only two external signals, INT and RESET, for proper
operation.
6.2 Output Controller
The output state machine provides signals which control the output and formatting of
correlated data. This chip has two modes for data output, byte serial mode and word
serial mode, which are selected with the BYTE pin. When the circuit is in byte mode the
24 bit counters are read out in 8 bit bytes. Word mode sends only the upper 16 bits of
each register. Output is under user control. New data appears on the output pins on the
rising edge of OUTCK. When OUTCK is low the output pins are tristated. Additionally,
when in byte mode, the upper 8 bits of the output port are tristated. When all data has
been read the DATARDY flag is reset.
2nd NASA SERC Symposium on VLSI Design 1990 3.3.9
The output controller, therefore, must control several operations. Byte mode requires
that the three 8 bit portions of the overflow counters be multiplexed onto the lower 8
bits of the output port. The mux control signals are formed by the output controller.
Additionally, the instate signal, for the upper 8 bits of the output port, is provided by
this machine. A seven bit counter is decoded for either byte or word mode. This counter
maintains a count of the amount of data which has been read out. When the output shift
register is empty, the DATARDY flag is reset. This controller also provides the clock for
the output shift register itself.
Again, as in the input controller, Sequence Invariant State Machines were utilized in
this controller. As logic design progressed, inevitable changes were easy to implement with
these functional blocks.
This controller requires three external signals. BYTE to signal which output mode is
active, DRI which is the data ready signal from the input controller and RESET.
7 Layout
The mask design of the correlator chip was straight forward as the structure is extremely
regular. The base cells and the layout of the correlator core required only 30 hours of layout
time. The core of the chip, the 32 correlator channels, contains 31,948 transistors. The
n-transistor to p-transistor ratio is 1.77, which reflects the extensive user of n-transistor
pass networks. The silicon area consumed by the core is 3.49mm by 2.52mm, which yields
a density of 275.3/imJ/<£evice.
The layout of the Sequence Invariant State Machines was done with a pre-released
version of the silicon compiler described in [2]. The correlator chip, as a whole contains no
more than 120 random devices. Figure 4 is a plot of the correlator chip.
8 Summary
A 25 MHz CMOS correlator chip has been described. The chip provides either crosscorrela-
tion or autocorrelation of 2 bit input signals at a data rate of 25 Megasaxnples/second. The
32 channel chip, designed for space applications, consumes no more than lOmW/channel.
The VLSI circuit has two options for data output and provides a simple handshaking
scheme. The layout of the correlator is highly regular and has taken advantage of Se-
quence Invariant State Machines, in the controller design.
9 Acknowledgements
The authors wish to acknowledge the NASA Space Engineering Research Center program,
whose support made this design possible. The staff of the Microelectronics Research Center
also deserves credit for all of the assistance they have provided. The support of Dr. Gary
Maki, director of the MRC, was also invaluable.
3.3.10
Figure 4: Correlator Chip Plot
2nd NASA SERC Symposium on VLSI Design 1990 3.3.11
References
[1] S. Whitaker and S. Manjunath, "Sequence Invariant State Machines," Proceedings of
the NASA SERC 1990 Symposium on VLSI Design, Moscow, Idaho, January 1990J
pp. 241-252.
[2] D. Buehler, S. Whitaker and J. Canaris, "Automated Synthesis of Sequence Invariant
State Machines," Paper to be presented at the 2nd NASA SERC Symposium on VLSI
Design 1990, Moscow, Idaho, November 1990.
[3] D. Radhakrishnan, S. Whitaker and G. Maki, "Formal Design Procedures for Pass
Transistor Switching Circuits," IEEE JSSC, Vol. SC-20, April, 1985, pp. 531-536
[4] G. Peterson and G. Maki, "Binary Tree Structured Logic Circuits: Design and Fault
Detection," Proceedings of IEEE International Conference on Computer Design: VLSI
in Computers, Port Chester, NY, Oct. 1984, pp. 671-676.
[5] C. Pedron and A. Stauffer, "Analysis and Synthesis of Combinational Pass Transistor
Circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, Vol. 7, July 1988, pp. 775-786.
[6] J. Canaris, "A High Speed Fixed Point Binary Divider," Proceedings ICASSP-89,
Glasgow, Scotland, May 1989, pp. 2393-2396.
[7] A. Rothermel ei a/., "Realization of Transmission-Gate Conditional-Sum (TGCS)
Adders with Low Latency Time," IEEE JSSC, Vol. 24, June 1989, pp. 558-561.
[8] J. Canaris and K. Cameron, "A Comparison of Two Fast Binary Adder Configura-
tions," Proceedings of the NASA SERC 1990 Symposium on VLSI Design, Moscow,
Idaho, January 1990, pp. 78-86.
[9] S. Whitaker, G. Maki and M. Canaris, "A Programmable Architecture for CMOS Se-
quential Circuits," Proceedings of the NASA SERC 1990 Symposium on VLSI Design,
Moscow, Idaho, January 1990, pp. 223-230.
