Ultra low power CMOS technology by Burr, J. & Peterson, A.
3rd NASA Symposium on VLSI Design 1991
\Tgzi:- 1o o_ 0073
11.1.1
Ultra Low Power CMOS Technology
J. Burr and A. Peterson
Space, Telecommunications, and Radioscience Laboratory
Department of Electrical Engineering
Stanford University
Stanford, Ca. 94305
burr@mojave.st anford.edu
Abstract - This paper discusses the motivation, opportunities, and prob-
lems associated with implementing digital logic at very low voltages, including
the challenge of making use of the available real estate in 3D multichlp mod-
ules_ energy requirements of very large neural networks_ energy optimization
metrics and their impact on system design, modeling problems, circuit design
constraints, possible fabrication process modifications to improve performance,
and barriers to practical implementation.
1 Introduction
As technology continues to scale into the submicron regime, massively parallel architec-
tures are increasingly being constrained by power considerations. Minimizing the energy
per operation throughout the system is assuming increasing importance. We are investi-
gating "Ultra Low Power CMOS" to reduce the energy per operation in massively parallel
signal processors, microsatellites, and large scale neural networks. We are investigating
operating with supply and threshold voltages of a few hundred millivolts to reduce energy
per operation by a more than 100 times.
In this paper, we show that minimum energy per operation is achieved in the sub-
threshold regime, and that the optimum performance is obtained when Vdd = V, and
Gnd = V, - Vdd. We also show that minimum energy × time occurs when Vdd = 3V,. We
show that Vt should be chosen such that Ion/Iofr = ld/a, where Id is the logic depth and a
is the activity ratio, the fraction of gates which are switching at any given time. We also
show that ld = 11 minimizes energy in a 32x32 bit parallel multiplier.
2 Motivation
The application domains we are targeting include wideband spectrometers requiring 1012
operations per second, microsatellites with 100roW power budgets, large scale neural net-
works requiring 10 xs connections per second and lf:l per connection, and small, massively
parallel digital signal coprocessors.
As an example, a single SBus slot in a Sun SPARCstatlon occupies about 200cm 3, can
accommodate over 2000cm 2 of active silicon using 3D stacked multichip module technology,
and has a power budget of 10W (see Fig 1). An architecture with a power density of
2W/cm 2 and 40 MIPS per chip, typical of modern microprocessors, would dissipate 4KW
https://ntrs.nasa.gov/search.jsp?R=19940013900 2020-06-16T18:06:56+00:00Z
11.1.2
Figure 1: 3D MCM in an SBus slot: 20O0 cm 2, 10W max. Vdd = 0.7V permits 10 G!PS.
if tiled over the available area and ac_eve 80 billion operations per second. Only 5 cm 2 of
silicon can be used at 10W, yielding 200 M!PS. If the supply voltage is lowered to 700mV,
eac_t ctx_p W ouM dl_sslpate 5roW, an_-t]_e entire 2000cm _ couTd _be used to achieve_10 ]J{l]Jon
operations per second at I0W. " -
3 Background
Low voltage digital logic is not new. Richard Swanson described a 100mV CMOS ring
oscillator in [6]. Eric VSttoz discussed subthreshold design techniques used in the digital
watch industry in [4]. Carver Mead described a variety of subthreshold analog circuits
for neural networks in [1]. We believe that low voltage circmts can be used effectively for
massively parallel computation in power constrained environments, and that lowering the
voltage in submicron technologies has the added benefit of maintaining manageable signal
frequencies at the system level.
4 Transistor Current
The following equations [6,7] describe drain current as a function of gate voltage, as shown
in Fig 2.
3rd NASA Symposium on VLSI Design 1991 11.1.3
Iog(Io)
Iog(Id_
subthreshold
r V
saturation
V 2
V t
Vgs
Figure 2: Transistor current vs voltage.Current in exponential with voltage below Vt, and
quadratic above Vt.
lO'S
10-6
107
lO-S
I0-9
I0-Io
0 0.;5 0.i 0.15
thresholdmismatch
i
0.2
Vgs
O. 5 0.3 '0.35 0.4
Figure 3: Model discontinuity at Vg, = Vt. The subthreshold model says Ia, = knV_. The
saturation model says Ia, k= _(Vg, - Vt) 2 = 0. In the figure Vt = 200mY.
!1.1.4
subthreshold: v_. < v,; Io = knV_
Id, Ioe_(1 -_= --e--vT)
saturation: v, < v_. < v_. + v,
Ia, k= _(y_.- y,)_
linear: Va. + Vt < V_.
Z_. : _(2(V,.- V, IV_.- Vd2.)
where Vg, is the gate-source voltage, Vt is the threshold voltage, I_, is the drain current, k
is the transconductance in A/V 2, n is the gate coupling coe_cient, usually around 0.7, VT
is the thermM voltage, 0.026V, and I0 is the current at _, = _.
Note the exponential dependence of current on voltage below Vt, and the quadratic
dependence above Vt. These equations do a poor job of modeling behavior in the neigh-
borhood of Vt (see Fig 3).
relative performance vs supply and threshold voltage
5
4.5
3 v,--t0:0s:81 ......::::"?-::::ii:::'?!.:
2.5
,",* ..°..."/° °, "..' ,"
2
1.5
I
0.5
0
0.5 1 i.5 2 2.5 3 3.5 4 4.5 5
Vdd
Figure 4: Performance vs voltage for different values of Vt.
Performance can be approximated when the supply voltage is over threshold by
.f : ,r/Q : _(v- v,)V(cv).
where f is the clock frequency, k is transconductance, and C is the capacitance being
switched.
5 Optimum Logic Depth
3rd NASA Symposium on VLSI Design 1991 11.1.5
0.05
0.045
0.04
0.035
0.03
_ 0.025
_, 0.02
0.015
0.01
0.005
0
5
multiplier power
, , , - , , -- i " i , +
v i i i 1-. ...... l ..... L--I0 I 20 25 30 35 40 45 50 55
logicdepth
Figure 5: Optimum logic depth of a 32x32 bit tree multiplier. For a given ld, the supply
voltage is lowered to match the unpiped throughput. Minimum power consumption occurs
at ld = 11. Latch energy increases as Id decreases, eventually exceeding logic energy, which
decreases as Id decreases.
2
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
5
I , L, . f i
I0 15 20 25
multiplier area
I ,...I. ,,
30 3'5 40 4'5 5(}
logicdepth
55
Figure 6: Relative area vs logic depth in a 32x32 bit multiplier. The area penalty at
Id = 11 is 37%.
I1.I.6
We found the optimum logic depth in a 32 x 32 bit tree multiplier by reducing the supply
voltage to keep the throughput constant (see Fig 5). We also found the area penalty using
this approach (see Fig. 6). Id = 11 is close to the propagation delay through a 4:2 adder
[2].
6 Minimum Energy
The current available to switch a node is the difference between the current of the ON
device and the leakage current of the OFF device. In standard CMOS, Vt is so high that
-Toll can be ignored, but in low voltage applications it can be an appreciable fraction of
Zo :
Q c,v c,v
I I Io. - Io/!
ld
Eat = IoffYIdtpd = CgY 2-i_-, 1
loy! :
1 2Eoo = -: CoV : ....... :
2--
21_E = E,c+E,lc= CgV2(a+ z-- )
E is minimum when Io,,/I,,I! is maximum. Referring to Fig 2, I,_/I,,/! is maximum and
constant in the subthreshold region.
In the subthreshold region, if Va, = V -- Vhi -- Vto, then Ion/Io/t = e(Vh'-v_*)/('vr) =
e v/(nvr), so E depends only on V = Vhl -- Vto. Therefore, for a given Vdd, energy is constant
in the subthreshold region. For maximum performance at minimum energy, set Vhi = Vt
and Vlo = Vt - Via.
DC energy rises exponentially as Vdd decreases. AC energy rises quadratically as Vdd
increases. For optimum Vt,
P,,¢ = aCV2 f
Pdc = folly
IN = ldCV f
If Pac -- Pec and Vdd = V_, then
I_/Io// = ld/a = ev'/('vr)
= nVrln(Zo./Lfj)
Figs 7 and 8 show energy vs Vdd. Table 1 lists the voltages and energies at the global
minima.
3rd NASA Symposium on VLSI Design 1991 11.1.7
enerl_/vs supEIy and threshold volta_10.1o
10-1t
". 10.12
10-13
10-14
10-15 , ....
0 0.5 1 1.5
a=0.10Jd=10
i i i i i i
2 2.5 3 3.5 4 4.5
10"13 , , , ,
_ 10-14
Emin@Vdd=220mV,Vt=- 120mV
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Vdd, V
Figure 7: Energy vs supply voltage for a = 0.10, Id = 10 in 2# CMOS
lO-lO
10-11
10-12
10-13
10-14
10-1_
0
, en, erw/vs supply _d threshold voltage , ,
a=0.01 Jd---40
Vt=[0:.04:.8]
i i i i t
1 1.5 2 2.5 3 3.5 4 4.5
10-12 , , ,
I Emin@Vdd=380mV,Vt=280mV
_._ 10"13
i 10"14
1045 , , _ , ,
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Vdd, V
Figure 8: Energy vs supply voltage for a = 0.01, Id = 40 in 2# CMOS
11.1.8
7 M" • mmlmu Energy x Time
l/(¢nergy x time) vs supply and threshold voltage
1.8
1.6
1.4
1.2
1
0.8
0.6
: Vt = {.08:.08:.8]
0.4
0.2
0
0 0.5 I 1.5 2 2.5 3 3.5 4 4.5 5
Vdd
Figure 9" 1/(energy x time) vs Vdd and _. Et_ occurs at 3Vt.
The minimum energy solution is quite slow. Performance should improv e dramatically in
deep submieron and with low voltage process optimizations. An alternative approach is to
minimize energy x time. If we assume transistors operate mostly in saturation, then
Et = V2Q/_r= V3/(V - Y,)_
9
Et,_,i,, -- -_V at V = 3Vt
Fig 9 shows a maximum at 3Vt which grows much more pronounced at low voltage.
8 Circuit Design Constraints
A number of interesting circuit design constraints appear when leakage currents are large,
and when the dependence of current on voltage is exponential. Three constraints we have
observed to date:
• Dynamic circuits are difficult to manage. A minimum size transistor will have a
leakage current of about lnA at Vt = 160inV. A dynamic storage node with 100fF
of capacitance will hold 50fC of charge at Vdd=0.5V. A change of 100mV requires
movement of 10fC. 10fC/lnA = 10usec.
• Exponential dependence of current on voltage makes pass transistor logic di_cult to
use. nfets cannot pass ones and pfets cannot pas zeros, in particular, using nfcts as
access transistors for static latches does not work.
3rd NASA Symposium on VLSI Design 1991 11.1.9
parameter negative positive
reduce Xj increase Rs, RD decrease
cj sw , cgso, cgcIo
reduce To= increase k
reduce NB
reduce Na
reduce ND
decrease Vgornaz
(gate-src breakdown)
increase Co=
(increase energy)
decrease Vasmaz.
(punchthrough)
increase Rc
increase Rs, RD
decrease n
increase uo
decrease
cj, cjsw, Tz
decrease V,
decrease
cj, cjsw
Table 2: Process optimization opportunities.
• Fully static logic appears to work well. Transmission gate latches work nicely. SRAM
seems to work well, since one of the bitlines will be pulling down on a write.
9 Process Optimization
The opportunity exists to improve performance by optimizing fabrication processes for
low voltage operation. Carrier mobility degrades significantly in submicron processes as
channel doping is increased to prevent punchthrough in the presence of strong electric
fields. Reduced voltage operation results in weaker fields, permitting lower channel doping
which results in higher carrier mobility and increased transconductance.
Reduced voltage operation also permits lower diffusion doping, since higher diffusion
resistance will not impact circuit performance due to reduced transistor drain current.
This reduces diffusion capacitance to a negligible fraction of gate capacitance. The only
drawback of reducing diffusion doping is that lateral diffusion is reduced, increasing the
effective channel length. This is partially offset by the reduced Miller effect since the gate-
drain overlap capacitance is reduced. Table 2 summarizes the impact of various process
modifications on energy and performance.
While a lower bound of 60mV/decade is achievable at room temperature (dV =
nVTln(lO) with n = 1), dV is more typically 80mV/decade in 2# CMOS and 90mV/decade
in 0.8, CMOS. To=/do can be reduced by reducing NB, since do = ¢2_oi¢o./(qNB), where
1I.!.10
¢, = VTln(Neln,) and n, = x/1.5TSe-mS/vr x 10 TM [5].
Low gate, drain, and threshold voltages permit all doping concentrations to be reduced,
once again due to lower electric field strength. This has two benefits for low voltage
operation:
1. n is reduced, decreasing the subthreshold slope and thus reducing the supply voltage
(and therefore energy per operation) necessary to achieve the desired on/off current
ratio. :
2. source/drain capacitances are reduced, further reducing energy per operation.
10 Barriers to Practical Implementation
A number of practical considerations place a lower bound on supply voltage. These are:
external interfacing, controlling device thresholds, maintaining adequate noise margins,
power supply design, power consumption of OFF devices, and circuit speed. Multichip
module packaging provides the opportunity to isolate low-voltage subsystems from other
system components. Limits to low voltage operation may be determined to a large extent
by the power dissipation in level-shlfting interface circuits. Device thresholds have been
observed to vary with transistor geometry and even location on a chip [3].
A 10 watt power supply will have to deliver 20amps at Vdd = 500inV.
11 CIS Testchip
In the BiCMOS process at Stanford's Center for Integrated Systems, pfet gates are doped
p+ and nfet gates are doped n+. This means that if the channel implant is excluded,
both devices have thresholds close to zero volts. Vt can then be adjusted by adjusting
the substrate bias voltage. We have implemented a test chip which contains a number of
simple circuit structures (see Fig 10), and will hopefully have some results in time for the
conference. The chip has the following characteristics:
• Pfet gates doped p+ have Vt -_ 0V
• Independent substrate and well biases
• self-testing convolutional coder
• ring oscillatOr
• VCO
• single nfet, pfet, nand, latch
3rd NASA Symposium on VLSI Design 1991 11.1.11
W
Figure 10: Ultra Low Power test chip. Separate bias voltages together with zero-Vt pfets
permit threshold adjustment.
12 Conclusions
Submicron CMOS, together with 3D stacked multichip modules, and massively parallel
machines demand new approaches to power dissipation. We are in the very early stages of
investigating reducing energy by reducing supply and thresholds voltages. We are hopeful
that low voltage CMOS can find widespread use in performance driven, power constrained
systems.
13 Acknowledgements
This research was supported in part by NASA grants NACW1910 and NACW419, by a
gift from Intel Corporation, and by a grant from Stanford's Center for Integrated Systems.
Multichip modules were provided by ATT, workstations by Sun Microsystems, and VLSI
fabrication by MOSIS.
References
[1] Carver A. Mead, " Analog VLSI and Neural Systems", Addison-Wesley, 1989.
!1.!,!2
[2] James B. Burr and Allen M. Peterson, " Energy considerations in multichip-module
based multiprocessors ", IE_ International Symposium on Circuits and Systems,
199t,
[3] A!cks__ad.r_ P_vasovi¢ and A_dreas G. A_dreou and Charles R. Westgate, " Charac-
terization of CMOS process variations by measuring subthreshold current", Nonde-
4tr_¢t{ve Charqcterization of M_ateria_[_ fV, Plenura Press, 1991. :: :: :
[4] Erie A. Vittoz, " Micropower techniques", Design of MOS VLSI Circuits for Telecom-
munications, Prentice-Hall, 1985. :
[5]
[7]
James R. Pfiester,
thesis, Stanford University, !984.:_ ?i_ -_i_ A_I:,_:_: : _-=_:_:i
[6] R.i.chard M. Swanson, " Complementary MOS transj_to.rs in micropower circuits ",
PhD thesis, Stanford University, 1974:
David A. !todg_es and Hor_¢_e G. Jackson
Circuits, McGraw-Hi!l_ 1983.
" Performance limits of CMOS very large scale integration", PhD
Analysi_ a_d Design o[ Digital Integrated
