A 32-bit Ultrafast Parallel Correlator using Resonant Tunneling Devices by Haddad, George I. et al.
A 32-bit Ultrafast Parallel Correlator Using Resonant Tunneling Devices
Shriram Kulkarni, Pinaki Mazumder, and George I. Haddad
Department of Electrical Engineering and Computer Science
The University of Michigan
1301 Beal Avenue, Ann Arbor, M148109-2122
Abstract
An ultrafast 32-bit pipelined correlator has been implemented using resonant tunneling
diodes (RTDs) and hetero-juncfion bipolar transistors (HBTs). The negative differential resistance
(NDR) characteristics of RTDs is the basis of logic gates with the self-latching property that elim-
inate pipeline area and delay overheads which limit throughput in conventional technologies. The
circuit topology also allows threshold logic functions such as minority/majority to be imple-
mented in a compact manner resulting in reduction of the overall complexity and delay of arbi-
trary logic circuits. The parallel correlator is an essential component in CDMA transceivers used
for the continuous calculation of correlation between an incoming data stream and a PN sequence.
Simulation results show that a nano-pipelined correlator can provide an effective throughput of
one 32-bit correlation every 100 ps, using minimal hardware, with a power dissipation of 1.5
watts. RTD+HBT based logic gates have been fabricated and the RTD+HBT based correlator is
compared with state of the art CMOS implementations.
1. Introduction
Space based communication systems experience high signal-noise (S/N) ratio in the trans-
mission channel and have inherently low power budgets for communication. An added constraint
is the requirement of high reliability and security for space to earth transmissions due to their vital
nature in supporting military and civilian systems. Spread spectrum communication increases
transmission bandwidth by distributing the data signal energy over a large frequency band by use
of a pseudo-noise (PN) spreading sequence. The uniqueness of the PN sequence results in receiv-
ers being able to detect the transmitted signal even in the high noise environments due to low
cross correlation with extraneous transmissions. Thus, the required transmitter power is reduced
in spread spectrum systems. Spread spectrum signals have low probability of detection by unin-
tended receivers and hence provide good security. Similarly, the redundancy in the spread spec-
trum signal allows for reliable communication. Hence, spread spectrum modulation satisfies the
constraints imposed by space based communication systems.
The parallel correlator forms an essential component in a digital communication system.
Typically, in spread spectrum systems, a parallel correlator computes the correlation of the incom-
ing data stream with a pre-determined pseudo-noise (PN) sequence of a fixed length. This correla-
tion value is used to estimate the output data. For a binary input data stream, the result of such an
operation essentially determines whether the output should be 0, 1 or indeterminate. An indeter-
minate output is primarily caused due to the receiver PN sequence not being the same as the trans-
mitter sequence. Thus, communication between different transceivers can be regulated on the
This work was supported by the U.S. Army Research Office under URI program contract DAAL03-92-G-
0109 and by ARPA under contract DAAH04-93-G-0242
https://ntrs.nasa.gov/search.jsp?R=19960054094 2020-06-16T03:44:59+00:00Z
basis of PN sequence uniqueness. This provides the capability of rejecting interference from mul-
tiple transmission paths and jamming [1 ].
The correlator as described in this paper is particularly suited for direct sequence spread
spectrum systems that use binary phase shift keying as the digital modulation. Figure 1 shows the
essential function of the spread spectrum demodulator along with waveforms for desired signal
reception and jamming signal rejection. The serial input data stream is shifted with each clock
cycle and correlation is performed between the fixed PN sequence and as many stored bits of the
input data stream as the length of the PN sequence. The ability of the system to respond only to
the spreading code while rejecting others makes it useful in systems that experience jamming and
multipath interference. The same feature is the basis of code division multiple access (CDMA)
systems that allow multiple users to carry out independent messaging in a single spectrum band.
The correlation value between the incoming data stream and the PN sequence has to be generated
at each clock cycle. If a purely combinational circuit along with a shift register were chosen to
implement the correlator, for long PN sequences, it would result in extremely slow operation due
to many levels of logic required for computation of correlation. However, in a bit serial communi-
cations application as described in this paper, there is no data dependence and hence deep pipelin-
ing schemes can be effectively used to improve the throughput of the correlator.
2. Theoretical development
From a hardware viewpoint, correlation between the two binary streams can be repre-
sented as follows.
_(1;) = E(f(t) Gg(t-x)) (1)
_riaJdata iq
crop rate_
Inteqmtor I _'_po]d _error
__.J
f_ata rate) ] _ data rate)
Desired Signal Reception
t__J----l____l I_i--I__1 L_E--C_
_a°read ] [
-------- I
a'u't'__'=----------_ I
32
_ate t J L ..........
Jamming Signal Rejection
_aenal. l-- -- --
mmng I l _ J L _
_a Oread _
............................................................................................................................................................... _2
.0
_mato × - inc_ctarm:.".ate
error [
Figure 1. Spread spectrum demodulator
Here, fit) and g(t) are binary data streams which specifically represent the PN sequence
and the input data stream for discussion of the parallel correlator. The XOR operator correlates
two binary inputs i.e. it produces a logic 1 output only if the two signals are unlike. The summa-
tion of the XOR outputs over the length of the signals gives a measure of the likeness between the
two signals. The difference between the number of ls and the number of 0s in the correlation vec-
tor will result in a number that ranges from the negative of the PN sequence length, through 0, up
to the positive of the PN sequence length reflecting a 0, indeterminate and l output respectively.
Thresholds can be set for 0 and 1 detection to account for noise in the channel. The above number,
henceforth referred to as the correlation value, can also be written as follows.
Correlation Value = 2.(Z of is) - (PN sequence length) (2)
3. RTD-HBT logic family
The current-voltage characteristics of an RTD can be approximated by the piecewise lin-
ear form shown in Figure 2.
stable, V 1 unstable
Ip .........................................
Iv
stable, V2. ,,._j#
_v
Region 1 _Vp Region 2 iVv Region 3
Figure 2. Piecewise approximation of RTD characteristics
As the voltage applied across the device terminals is increased from zero, the current
increases until the Vp, the peak voltage of the RTD. The corresponding current is call the peak cur-
rent, II, of the RTD. As the voltage across the RTD is increased beyond Vr, the current through the
device drops abruptly due to tunneling until the voltage reaches Vv, the valley voltage. The current
at this voltage is the valley current, I_. Beyond V_, the current starts increasing again. For a current
in [I v, lp] there are two possible stable voltages; V 1 < Vp or V 2 > V v. The tunneling characteristic of
the RTD facilitates implementation of self latching circuits.
3.1 Bistable mode operation
A binary logic circuit is said to operate in bistable mode when its output is latched, and
any change in the input is reflected in the output only when a clock or other evaluation signal is
applied. The bistable mode has been used in several earlier technologies, notably in superconduct-
ing logic [2]. Superconducting logic typically uses a multi-phase AC power source to periodically
reset/evaluate each gate. Similar logic using resonant tunneling devices has been proposed by sev-
eral authors [3, 4, 5]. The chief disadvantage of these circuits is the requirement of an AC power
source whose frequency determines the maximum switching frequency. The RTD+HBT logic cir-
cuits described below use a DC power supply and multiphase clocks but the clock signals are not
required to supply large amounts of power as in the case of the earlier circuits.
The operating principle of the new bistable element may be understood by considering the
circuit shown in Figure 3. There are m input transistors and one clock transistor driving a single
RTD load. The input transistors can be in either of two states - On, with a collector current of I H
or Off, with no collector current. The clock transistor can be in one of two states - High, with col-
lector current ICLKH, and Quiescent with collector current ICL Q. In addition, there is a global reset
state where all the collector currents are 0. When the clock transistor current is at [CLKQ, the load
lines in Fig. 1 show that the circuit has two possible stable operating points for every possible
input combination. When the clock current is ICLKH, there is exactly one stable operating point for
the circuit when n or more inputs are high and the sum of the collector currents is nl# + [CLKH.
This operating point corresponds to a logic 0 output voltage. Hence this circuit can be operated
sequentially to implement any non-weighted threshold logic function f(x l, x 2..... xm;n ), where
f(x 1, x 2..... Xm) is i if and only if (x I + x 2 + ... + Xm) < n, and x l, x 2, ..., xm take on values of either
0or 1.
)V+ CLK H. IN=m
/'f oLK.,N--o
IPEAK
• • CLK H, IN=n-1
n ' CLK H. IN=0
CLK Q, IN=m
CLK Q, IN=n
I _ CLK Q, IN=n-1
c_ 0 fn ' I _
...... I ..... IVALLEY
................................ l ...... i:_E_i_ i
reset _low ]0 IVhigh
Figure 3. RTD+HBT bistable logic gate operating principle
The operating sequence is as follows:
1. Inputs I 1 through Im change.
2. The reset line goes high forcing all transistors into cut-off. The current through the
RTD falls below the valley current, and the fn node is pulled high.
3. The reset line goes back to 0. Thefn node remains high.
4. The clk signal goes high, causing the total current through the RTD to increase. If
more than n inputs are high, the current through the RTD exceeds the peak current
causing a jump to the second positive differential resistance (PDR) region of the RTD
characteristic corresponding to VRT D > VVALLEI_ where VRT D is the voltage across the
RTD and VVAt,L£ r is the valley voltage of the RTD. This results in the fn node going
low. If less than n inputs are high the current through the RTD does not exceed the
peak current and the operating point remains in the first PDR region of the RTD,
whereVRT D < VpEAK, and VpEAKiS the RTD peak voltage. Thus,fn remains high.
5. The clk signal goes to its quiescent state so that the current through the clock transis-
tor is ICLKQ. The output voltage at node fn reaches a stable level corresponding to
whether the RTD was in the first PDR region or the second PDR region in the previ-
ous step of the sequence.
For a three input circuit, three non-trivial threshold functions can be implemented for the
cases where n = 1, 2, 3. For n = l,fl(x 1, x2, x3) = 0 if and only if 1 or more inputs are high. This
corresponds to a NOR function. For n = 3,f3(xl, x 2, x3) = 0 if and only if all 3 inputs are high. This
corresponds to a NAND function. For n = 2,f2(xl, x2, x3) = 0 if and only if 2 or more inputs are
high. This corresponds to an inverted majority or inverted carry function.
Figure 4 shows the simulated traces obtained from NDR-SPICE [6] for an inverter, a three
input NOR, and a three input MINORITY gate designed using RTDs and HBTs. It can be seen
that the outputs change only on arrival of the clock pulse and hence the circuits are operating in
bistable mode. Input and output voltage swings are matched to enable cascaded circuits to func-
tion correctly. The signal levels are 1V for logic zero and 2V for logic one.
A --
J
1
20r ......B b0.8 ___- .... __
C --
/_--a__
0.8 ......... .1
--L
.I.
.J_
__L
I
1
-I-
-L
_l
1_
I
k
...4-
.I-.
1
.L
t
2.5L_ __ -r F
0.5 ........ ___ Z Z--_---T_-_-_-_-
...... 5--
I ...... I ----
0.5 ........ _L ........ _L____
I _:--q----4_-7
O. _L_
0.0 0.5
_J
--I
I
1.0 time (ns)
--4
--I
]
_J
J
J
--4
]
]
]
1
I
I
--1
I
I
I
I
I
I
1
I
__J
.5
Figure 4. RTD+HBT basic gates simulation
3.2Design constraints
We now present the design equations for a k input threshold gate with a threshold value of
n. Let m be the are of the RTD used and let Jp and Jv represent the peak and valley current densi-
ties of the RTD, respectively. IH, ICLKH and ICLKQ are defined as in section 3.1. The design con-
straints for the aforementioned gate can be written as:
h = mJp - (IcQ + kl H) > 0
l = ICQ - mJ V> 0
hh = mJp- (IcH + (n-1)//4) > 0
(3)
(4)
(5)
hl = ICH + nI H- mJp > 0 (6)
ICH > ICQ > 0 (7)
where,
h = quiescent clock, logic high switching margin
l - quiescent clock, logic low switching margin
hh - high clock, logic high switching margin
hl = high clock, logic low switching margin
The design process begins by choosing the input high and low voltages. The input high
and low voltages must respectively turn the input transistors on or off. To maintain good noise
margins, signal voltage swings should be maximized. However, for cascaded logic stages to oper-
ate correctly without resorting to use of level shifters, it is necessary to match the input and output
voltage swings. An optimum match resulted in the signal voltage levels being set to 1V for logic 0
and 2V for logic I. The input transistor size determines the value of IH. I H should be small to min-
imize power consumption and area, but should be large enough to have good switching margins
hh and hl. The value of ICLKQ and the area of the RTD are determined from the equations involv-
ing ICLKQ. The peak and valley current densities (Jp and Jr) are determined by the growth pro-
cess, and the RTD area factor m determines the actual currents. The simulations in this paper use
an RTD with peak current of 100 I.tA and a valley current of 25 gA for m = 1. Setting ICLKQ =
(m(Jp + Jr) - kill) 2 satisfies both equations (3) and (4), when m is chosen such that m > 3In/(J P -
Jr)- This also results in the equalization of the switching margins h and I. Choosing [CLKH = mJp -
(n - 0.5)I n satisfies the remaining design equations and also equalizes the switching margins hh
and hl. The clock line voltages and the clock transistor sizes are determined from the values of
ICLKQ and ICLKH. The switching margins for the circuits are 0.5I H or a 50% variation is allowable
in the drain current of any one input transistor. When all transistors are systematically larger or
smaller, the allowable variation before the circuit malfunctions is 0.51n/n. For a NOR gate, n = 1
and the allowable variation is 50%. For a 3-input inverted majority gate the allowable variation is
25% and for a 3-input NAND gate it is 16%. Thus, the switching margin of a NOR gate remains
constant with increase in the number of inputs whereas, the switching margin of a NAND gate
degrades rapidly with increase in the number of inputs. Thus, the best design margins are pro-
vided by the NOR function and the NAND function should be avoided in so far as possible.
3.3 Co-integration of RTDs and HBTs
RTDs and HBTs were integrated on the same wafer to build a 3-input threshold gate with
the same topology as the circuit shown in Figure 3. Figure 5 shows a photomicrograph of the
integrated circuit. The functionality of the circuit is determined by the input and clock voltages as
discussed in section 3.2. By adjusting the values of the supply voltage, input high voltage and
clock voltages NAND, NOR and MINORITY functions were tested and the oscilloscope traces
are shown in Figure 6. It should be noted that in the correlator design, the signal voltages are fixed
and hence functionality of the gates is determined by the device sizes.
ClockGroun
C m
Resis
B
A Vout
HBT
-RTD
Reset Vcc
Figure 5. Photomicrograph of RTD+HBT bistable gate
3.4 Pipelined computation
Pipelining is a well studied means of speeding up any computation. An existing combina-
tional block is divided into several sequential stages such that each stage performs a different
operation during a particular clock cycle. The drawback of pipelining is that each computation
takes the same or more time as nanopipelining [7] but there is an added penalty in the area
devoted to the pipeline latches in the circuit.
Consider a combinational block that is composed of n stages with each stage having a
delay of tc. This results in a total delay of n.t c. We could partition the combinational block into k
stages from 1 to n, where each stage output is latched. If we assume a latch delay of tI, the maxi-
mum delay of the circuit is now (n.tclk + tl). The throughput of the circuit increases from l/(n'tc)
to l/(n'tc/k + t[) but the latency increases from n.t c to n.(t c + tl). Also, if a c is the area of the com-
binational block; after pipelining, the area of the circuit increases to a c + k.m.a I, where a I is the
area of a latch and m is the number latches at each stage. The best possible theoretical throughput
would be l/t c when we have latches at the output of each combinational stage. However, if all
combinational stages don't have the same delay, then the maximum achievable throughput with
the use of separate pipeline latches is 1/(b.t c + tI) where b.t c is the longest combinational stage
CLK
IN1
IN2
IN3
CLK
IN1
RESET
NOR
RESET
NAND
CLK
IN1
RESET
Figure 6. Oscilloscope traces for fabricated RTD+HBT primitive gates
delay. If the latch delay t I is much larger than the longest combinational delay b.t¢, it places an
upper bound on the maximum achievable throughput of the pipelined circuit. Thus, we see that
pipelining using conventional logic results in direct trade-offs between the area of the pipeline
latch and the achievable throughput• The use of bistable NDR devices in designing circuits
improves the performance of nanopipelined circuits over conventional pipelined circuits because
the latch delay, tl=0. Also, if latency is not of concern, each logic gate can operate in the bistable
mode resulting in maximum possible throughput•
3.5 Nanopipelined full adder implementation
The basic bistable logic gates mentioned previously are used to build a nanopipelined full
adder that best illustrates the advantages of the NDR logic family. For the parallel correlator, we
prefer an adder with complementary sum and carry outputs in order to reduce the number of pipe-
line stages and hence the latency of the circuit. The complementary sum and carry functions for a
i-bit full adder are written as follows.
= aGbGc. (8)
In
= a • b + b. Cin + Cin " a (9)
The S function is implemented as a three level nanopipelined circuit whereas the C func-
tion is implementedusinga singleminority gate.The circuit for the 1-bit full adderis shownin
Figure7. It is apparenthattheS and C outputs are not synchronized with each other. For a single
stage of addition, we would need to add two bistable buffers at the _ output to synchronize the
and C outputs. However, in the correlator we perform several successive stages of addition and
synchronization at each adder will result in increased latency. Hence, synchronization is per-
formed after all stages of addition are complete. For correct operation of the true-bistable logic
gates a reset and evaluate pulse is required as mentioned previously. However, when multiple
gates are cascaded, as in the implementation of the full adder, a gate must be evaluated only after
all its inputs have been correctly evaluated. This requires a two-phase evaluation scheme in which
each gate is evaluated in a different phase than its fan-ins and fan-outs. An example timing rela-
tionship between phases of consecutive logic blocks for the parallel correlator is illustrated in
Figure 8.
A
c
STAGE1 I __--1
(RES1, CLK1) J T_
o
I
-- STAGE2 J TAGE3 SUM
B COUT
'% % _-t
c
A+B+C
,inational
%'%%
0 0
A+B+C)
Parasitics
10 fF across each RTD
10 fF at output of each
inverter and 2-input gate
15 fF at output of each 3-
input gate
Distributed RC of a
5001_mx2t_m line (9_,
30fF) is represented by:
.#.
,¢
_RES1 _ CLK1 _RES2 _ CLK2
Figure 7. l-bit nanopipelined full adder with complementary outputs
The resl and clkl signals form phase 1 of the clock whereas res2 and elk2 form phase2 of
the clock. The two phases of the clock must be non-overlapping. However, the reset and clock sig-
nals of a phase may partially overlap as shown in Figure 8. A large overlap period between the
aforementioned signals is not desirable since the circuit output is not valid during this time. The
simulatedoutput for the l-bit nanopipelinedadderis shownin Figure9. To projectrealisticper-
formance,load capacitancesandparasiticshavebeenaddedto the RTDsandHBTs usedin the
circuit. Also, clock and resetlines areassumedto beglobal lines with adistributedRC parasitic
elementsasshownin Figure7. Thecircuit outputsareassumedto drive global bus linesacross
thechip.Thetwo phaseclockconsistingof resetl-clockl and reset2-clock2 operates at 10 GHz.
RES1
CLK1
RES2
CLK2
............................ im
Figure 8. Multiphase timing scheme
4. Correlator implementation
The block diagram of the pipelined correlator is illustrated in Figure 10. A 32-bit latch
holds the PN sequence. The input is a serial bit stream which is fed to a 32-bit shift register. The
32-bit latch and 32-bit shift register are each composed of 64 bistable inverters. A pair of cascaded
bistable inverters each operating on single, separate phases of the two-phase clock form the basic
l-bit latch. The 32-bit raw correlation vector is generated by performing a bitwise XOR operation
on the PN sequence latch output and the most recent 32 bits of the sampled signal available at the
shift register output. The raw correlation vector is registered and this forms the input to the pipe-
lined adder network that determines the difference between the number of Is and 0s in the raw
correlation vector. This is the correlation value between the incoming signal and the resident PN
sequence and is determined for the 32 most recent data bits at every clock cycle. This value ranges
from -32 to +32. The functional description of the correlator is illustrated in the equations (10)
through (14).
l(din ) ..... D din ) t (10)data[31 _--0] = tD32(din)' D3 1(
code[31 6---0] = tDI(PN31),DI(PN30 ) .... ,DI(PNo) t (11)
corr[31 *--- 0] = code[31 <----0] <_data[31 _----0] (12)
31
sum[5*--O] = _ corr[i] (13)
i=0
diff[6 _ 0] = 32a-2 ,, sum[5 _-- 0] (14)
Here, D i (s) represents the value of signal s, i clock cycles prior to the current input.
AB
C
CLK
RES
SUM
COUT
1
0.0 0.5 1.0 time (ns) 1.5
Figure 9. l-bit nanopipelined adder simulation including parasitic elements
P1_31 P_I0 serial dlta in
131 ........................................I'°1 I_ "' ..............................°1
code[31:0] 1t-"32 1t"32 data[31:0]
I corr[31:01 I
I,
Pil3e ined Adder Network
diff[6:0]--32 -2" Z;corr[ ]
diff[6:0]
Figure 10. Pipelined correlator block diagram
4.1 Pipelined Adder Network
The adder network consisting of 26 nanopipelined full adders, i I nanopipelined half
adders, and 36 bistable inverters is illustrated in Figure 11. The adders used in the design have
complemented sum and carry outputs in order to reduce pipeline latency. The input to the adder
network is the raw correlation vector generated by the 32-bit bistable XOR network. The circuit
performs eighteen stages of addition to generate a 7-bit result which is the difference between the
number of ls and number of 0s in the correlation vector. Since each stage is nano-pipelined due to
use of self latching gates in the bistable adders, the throughput of the circuit is one 32-bit correla-
tion every cycle. However, since the seven bits of the adder network output are not simultaneously
generated, bistable inverters are required to synchronize the bits such that all seven bits of a corre-
lation appear in order at the output of the correlator. The least significant bit of the correlation
value is always 0 since the difference between the number of ls and number of 0s in a 32-bit vec-
tor is always even. The pipelined adder network essentially sums up the number of is in the corre-
lation vector. Bits 0, 1, 2, 3 and 4 of the sum of ls directly translate to bits 1, 2, 3, 4 and 5 of the
difference between number of ls and number of 0s. Bit 6 of the correlation value is computed
while bit 5 of the sum of is is being generated by connecting the carry input of the final full adder
to V,ta. This achieves the 2s complement subtraction required for computing the difference
between the number of I s and number of 0s in the correlation vector. No additional pipe stages are
required for this conversion.
The functional simulation of the 32-bit parallel correlator is shown in Figure 12. The PN
sequence for this simulation is chosen to be AAAAAAAA Hex. Note, that this is not an optimum
PN sequence but rather is chosen for the ease of illustration of the functionality of the correlator.
The input is a pattern of alternating ls and 0s which results in the 32-bit shift register output tog-
gling between AAAAAAAA Hex and 55555555 Hex at each cycle. This causes the raw correla-
tion vector to alternate between all ls (FFFPPFFF Hex) and all 0s (00000000 Hex) with each
cycle. Thus, the desired correlation difference should be +32 decimal and -32 decimal respec-
tively for the two cases mentioned above. This is seen to be the case in the simulation output. It
should be noted that the simulation output reflects changes in the input 10 cycles prior to the out-
put due to pipeline latency. However, the same input pattern has been maintained and is shown in
the current plot for the purpose of illustration.
4.2 Comparison with CMOS technology
The correlator designed using RTDs and HBTs is compared with a CMOS implementation
using 0.5 micron process technology. The results of the comparison for three circuits - the basic
bistable majority gate, the bistable full adder and the 32-bit parallel correlator - are presented in
Table 1.
Table 1: Comparison of RTD+HBT circuits with CMOS implementations
Bistable Majority Bistable full adder 32-bit Parallel Correlator
Parameter CMOS CMOS CMOS
RTD+HBT RTD+HBT RTD+HBT
(0.5_t) (0.5_) (0.5_)
Device count 20 5 68 34 6000 2060
Power dissipation 0.7 mW 2 mW 2.3 mW 12 mW 600 mW 1.5 W
Speed 400 MHz 20 GHz 400 MHz 10 GHz 400 MHz 10 GHz
Power-Delay product 1.75 pJ 0.1 pJ 5.75 pJ 1.2 pJ 1.5 nJ 0.15 nJ
diff[O]
1
diff[1] diff[2] diff[3] diff[4] diff[5] diff[6]
K__
Each • represents one pipeline
stage operating on a single clock
phase
FA
HA
13
I1
Bistable full adder with
complemented sum and
carry outputs
Bistable half adder with
complemented sum and
carry outputs
Three stage inverter
Single stage inverter
corr[31:0]
Figure 11. Pipelined Adder Network
r
i_:_iiiiii;iii_i!i_il_i¸_i:i;!ii_:
55555555
aaaaaaaa aaaanaa aaaaaaaa
Figure 12.32-bit parallel correlator functional simulation
The RTD+HBT based correlator offers a tenfold improvement in power-delay product
even though it consumes greater absolute power. The fewer number of devices used in the correla-
tor also imply a reduction in wiring lengths and hence parasitics and delays associated with inter-
connects are much smaller in the RTD+HBT correlator.
Conclusions
The synchronous, sequential nature of true-bistable gates using RTDs and HBTs has been
exploited to build a very high speed and compact parallel correlator. Design equations and con-
straints have been studied and a design methodology for RTD+HBT bistable logic gates is pro-
posed. The bistable nature of the logic gates has demonstrated advantages over conventional logic
families by eliminating pipeline area and delay overheads in deep pipelined logic systems result-
ing in improved throughput and smaller circuit size. The compact implementation of threshold
functions allows a single gate carry function which facilitates design of high speed arithmetic and
logic functions used in the correlator. Reduction in device count has led to shorter interconnec-
tions resulting in reduced parasitic delays. The nanopipelined correlator offers a tenfold lower
power-delay product as compared to a state of the art CMOS implementation. The proposed
design style has applications in the development of high speed digital communication system
architectures to achieve several Gb/s data throughput. In particular, for space based communica-
tion systems, nanopipelined RTD+HBT based logic designs offer compact solutions with very
low power-delay products.
References
[ 1] R. E. Ziemer, and R. L. Peterson, Digital Communications and Spread Spectrum Systems, New
York: Macmallan, 1985
[2] T. VanDuzer, "Superconductor digital ICs," in VLSIHandbook (J. DiGiacomo, ed.), New York:
McGraw-Hill, 1989.
[3] K. MaezawaandT. Mizutani, "A new resonanttunneling logic gateemployingmonostable-
bistabletransition,"Japanese Journal of Applied Physics, vol. 32, pp. L42-I.A4, Jan 1993.
[4] S. Mohan, P. Mazumder, R. K. Mains, J. P. sun, and G. I. Haddad, "Logic design based on
negative differential resistance characteristics of quantum electronic devices," lEE
Proceedings-G, vol. 140, pp. 383-391, Dec. 1993.
[5] W. Williamson, III, "High-speed RTD-based logic for systems applications," in ULTRA
electronics program review, ARPA, Oct. 1994.
[6] P. Mazumder, J. P. Sun, S. Mohan and G. I. Haddad, "DC and transient simulation of resonant
tunneling devices in NDR-SPICE," in 21 st International Symposium on Compound
Semiconductors; 1994
[7] S Mohan, P. Mazumder, and G. I. Haddad, "Ultra-fast pipelined arithmetic using quantum
electronic devices," lEE Proceedings-E, vol. 141, pp. 104-110, Mar. 1994.

