On The Design of Adiabatic SRAMs by Ye, Yibin et al.
Purdue University
Purdue e-Pubs
ECE Technical Reports Electrical and Computer Engineering
4-1-1996
On The Design of Adiabatic SRAMs
Yibin Ye
Purdue University School of Electrical and Computer Engineering
Dinesh Somasekhar
Purdue University School of Electrical and Computer Engineering
Kaushik Roy
Purdue University School of Electrical and Computer Engineering
Follow this and additional works at: http://docs.lib.purdue.edu/ecetr
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
additional information.




On The Design of Adiabatic SRAMs * 









In the design of low-power circuits, adiabatic logic shows great promise. However, research 
till date have concentrated on adiabatic logic circuits/families. Today's VLSlsystems integrate 
random logic, megamodules and memories. Hence, the success of adiabatic circuits will depend 
on the efficient implementation of not only random logic, but also the other components of 
a VLSI system. In this paper, we present a design of adiabatic Siaiic RAM, which can be 
implemerited withorit greatly increasing area or circuit complexity. The design addresses the 
issue of building ultra-low power memory circuits in a VLSI system. Our results for a 41<b 
block of memory core indicates energy savings of approximately 75% for both read and write 
operations. Higher power savings are achieved in the address decoder and 1/01 drivers. 
*The research was supported in part by NSF and by ARPA. A preliminary version of the paper appeared in the 
1995 Low-Power Electronics Symposium. 
1 Iritroduction 
With the recent trend toward portable communication and computing, power dissipation has 
become ane of the major design concerns along with area and performance. Research to reduce 
power dissipation at various levels of design abstraction has started in earnest to  achieve ultra- 
low power with optimum performance [4]. In this paper we will consider circuit llevel techniques 
for low-power SRAM design using the principles of adiabatic switching. 
Powel. dissipation in static CMOS circuits can be best understood by considering the in- 
verter of figure l(a). A logic ONE to ZERO transition on input x turns the p-mos transistor 
on and the output node which is associated with capacitance C is charged from 0 to Vdd. 
With such a transition, VddQ(= CVddZ) of energy is extracted from the supply, half of which, 
$CVdd2, is stored in the capacitance temporarily, and the other half is dissipat,ed in the path. 
When the input experiences a LOW to HIGH transitmion, 3Cvdd2 of energy is again dissipated. 
Hence, C'vddZ of energy is dissipated in an entire cycle. It should be observed that whenever 
current experiences a voltage drop AV, energy is dissipated at the rate of iAV (instantaneous 
dissipative power), where i is the current. Such energy dissipation can be greatly minimized 
by considering adiabatic switching. Let us consider the circuit of figure l(b),  where the supply 
voltage swings gradually from 0 to  Vdd (evaluation period), stays at Vdd for some time (hold 
period) and then swings back from Vdd to  0 (restoration). If the output y at tlhe beginning of 
the evaluation period is at logic ZERO and input x is valid and is also equal to logic ZERO 
then the output node y would follow @ to a logic ONE in a way such that there will be very 
little volt,age drop across the channel of the p-mos transistor. Hence, only a small amount of 
energy is dissipated. For a more detailed analysis and design of adiabatic circuits, the reader 
is referred to  [6]. An effective model to estimate the power dissipation in adiabatic circuits is 
given by [6] 
where R is the effective channel resistanc,e, T is the transition time, and l/t is the threshold 
voltage of the MOSFET. The first term is referred to as the t,hreshold loss, and the second term 
is referred to  as the resistive loss. The threshold loss may not be present in some circuits if 
charginglldischarging through MOSFET switches do not experience a threshold voltage. Since 
RC < Ins for a moderate fanout, and T - l/f (f is the operating frequency), Edissipation is 
very sma~ll when the operating frequency f - 10MHz.  
The  power supply/clock waveforms can be generated using some simple slchemes [I, 3, 51 
which consist of two stages, as shown in figure 2. The first stage is the DC: power supply, 
and the second stage generates alternating current/clock waveforms, which is controlled by 
external clock signal(s) t o  maintain the constant frequency. Multiple phases of supply voltage 
may be required t o  cascade such logic gates. If the circuit viewed by the powerfclock generator 
is modeled as constant capacitive load, the entire system is effectively an RLC resonator. The 
dissipate'd energy is replenished by the first stage DC power supply by restoring the peak 
voltage of the second stage t o  Vdd (supply voltage) level in each cycle. The effective capacitance 
of the wh,ole circuit is approximately a constant when the circuit size is sufficielntly large. This 
can be justified as follows. If the average number of nodes switching in a ciircuit per cycle 
is N ,  then (for large N )  from central limit theorem [lo] the number of switching nodes has 
a nearly Gaussian distribution with the deviation of ~fi. I( is a constant which depends 
on the circuit structure and primary input signal patterns. Hence, the percentage deviation 
from the average is proportional t o  1 / a ,  which is negligible when N is large. The total 
load capacitance in a cycle is the sum of the capacitance associated with the switching nodes. 
Thus, the percentage deviation of the total capacitance from its average is also negligible for 
large circuits. 
Numerous designs of adiabatic logic have been presented in [I ,  3, 2, 5, 61, which have 
demonstrated the possibility of achieving ultra-low energy computing. Today's VLSI systems 
integrate both random logic and assorted memories. Hence, it is natural t o  apply the adiabatic 
switching; principle t o  memories t o  achieve similar large savings as in random logic. However 
the application of such methods should not cause drastic increases in either size or circuit 
complexity. We address this issue in this paper by presenting the design of a Static RAM, 
which is capable of recovering of the order of 75% of energy for both read and write operations. 
This is achieved witholit increasing the complexit,y of the memory cell and with a low area 
over head over conventional SRA M. 
The rest of the paper is organized as follows. Sect>ion 2 describes the design of SRAM 
memory core working in adiabatic fashion. An adiabatic address decoding scheme is described 
in section 3. Design of peripheral circuits is presented in section 4. A method t o  derive the 
optimal !supply voltage is derived in section 5. Quantitative results of the performance of our 
designs are detailed in section 6 based on layouts using MOSIS 1.2pm CMOS NWELL process. 
In section 7, we summarize certain features of our adiabatic SRAM. 
A.diabat ic SRA M Core 
In this se:ction, we first briefly describe the organization of SRAM, then we describe the topol- 
ogy of tlne memory cell and SRAM core in details. Figure 3 shows the adiabatic SRAM 
organiza1;ion. Compared t o  the standard CMOS SRAM, a row driver is inserted, which gener- 
ates appropriate voltage signals t o  drive the memory core. Also note that  sense amplifiers are 
replaced by the voltage level shifters. 
Figure 4 shows the memory cell, which is identical in topology t o  the 6-transistor RAM 
cell used in the standard SRAM. The cell  consist,^ of a cross coupled inverter pair and a pair 
of read/write access transistors. The pair of access transistors is enabled by. the word line. 
A block of SRAM core is composed of a m~lt~iplicit~y of these cells arrayed horizontally and 
vertically, and the memory core is again composed of multiple blocks. Within a block, the 
word lines of adjacent cells are connected along t,he horizontal axis, while the ,bii and bit lines 
are connected for all cells in a colnmn. 
With reference t o  figure 4, a conventional SRAM ties Vhi and KO,, t o  the supply Vdd and 
ground. For a discussion on the operation of a six transistor SRAM, we refer readers t o  [7]. 
The clesi,gn of the cell ratios the size of M3 and M5 (M4 and M6) so that the value stored 
in the cell is not upset. Layout of the memory core emphasizes compactnes~ and is usually 
flipped for two adjacent rows t o  share the power supply. The dominant component of energy 
dissipation arises from switching the large capacitance on the bii lines and the wlwdlines. Other 
significant components of energy consumpt,ion are row and column address dr:coders and the 
sense amplifier. This paper concentrates on reducing the energy required in each component 
by applying the adiabatic switching principle. Two types of RAM cells are presented in this 
section, each of which has advantages depending on the overall ar~hit~ecture of'the SRAM. 
2.1 Operations of Adiabatic SRAM Core 
In the adliabatic SRAM core of figure 4, Vhi and KO, are no longer static. Vhi, Kow and Vword 
are generated by the row driver circuitry, which is shown in figure 6. The row driver will be 
discussed, in detail in section 4. For the time being, it is sufficient for readers t o  realize the 
following: Row selection signals, Wo, Wl, . - - ,  WM-1, which are generated from row address 
decoder, enable the  drivers for a particular row. Vhi, Row, and Vword of the enabled row may 
now be controlled independently by global supply lines Ghi,  Glow, and Gword, respectively. 
For the unselected rows, Vhi, Kow and Vword are connected t o  the static power siipply lines 
Shi,  Slow, and ground, respectively. Figure 4 also shows the bit-line equalization transistor. 
Bit-line precharge circuits of the standard CMOS SRAM are not required in adiabatic SRAM. 
We now show that  by the proper application of stimulus a t  Ghil Glow, and Gwordl it is 
possilde t;o operate the memory core in an  adiabatic fashion. For the purpose of discussion, 
we shall assume the DC supply volt age of Shi = Vdd = 5 volts, Slow = 2 volts, and a transistor 
threshold 1 & I of 1 volt. The SRAM core starts out in the rest s tate with all rows disabled. 
The row driver circuit ensures that  Vhi is a t  Shi = 5 volts, KO, is a t  Sl0, = 2 volts, and Vword is 
pulled to  GND. T h e  bit-line is assumed precharged midway t o  2 volts. A read alperation starts 
with the  row selection being applied, and the Vword being smoothly ramped ulp t o  3 volts by 
Gword. Kri and KO, are now ramped down t o  3 volts and 0 volt, respectively. 'The waveforms 
are sliow~i n figure 7. The reader can also refer t o  circuit model of figure 13(b) for the read 
operation. If we assume that  internal node A was LOW(=Ko,,) and B was HIGIH(=Vhi), both 
M1 and Id5 are ON and hence, bi t  follows KO,, smoothly ramping down to  0. On the other 
hand, bil remains a t  2 volts since node B is a t  Vhi(>_ 3 volts) and v u o r d  is a t  3 uolts, which 
prevents access transistor M6 from turning on. The bit-line differential is a:mplified by an 
adiabatic level-shifter t o  generate the logic output. Unlike the conventional SRAM, where 
pre-charge circuitry is required to  precharge the bit,-lines after each operatioin, bit-lines are 
charged back t o  2 volts by the same cells being read. Indeed, bit reverts back to  the rest state 
through M5 and M1 when Vhi and KO, ramp up to  their rest state. This proc.ess replenishes 
the charge on bit and hence, the bit-line peripheral circuitry can be eliminated in adiabatic 
SRAM. Subsequently, Gword is pulled low, turning OFF the row and equalizing the bit-lines. 
The  stimillus applied on Vhi and KO, ensures a constant cell voltage all the tirnes. 
The write operation requires that the bit information stored in a cell be overwritten by 
signals carried in the  bit-lines. This is accomplished by applying the row selection, pulling up 
Vword to  enable the word line, and pulling down Vhi to  3 volts. Thus, the voltage difference 
across the cell is Vh; - KO, = 1 volt = &, which is high enough to  ensure that  cell s tate is held 
for colunnns which are not being written, and low enough t o  easily flip the ccall s ta te  for the 
selected columns. We now simultaneously and smoothly ramp down bi2 and V , l o w  from 2 volts 
t o  0 ,volt. B is then pulled down by bit and the cell s tate flips (as we have assumed in the 
read ope:ration, A was LOW and B was HIGH). Thus, we write a 0 into B, ;IS illustrated in 
figure 7. Returning t o  the rest s tate is accomplished in a fashion similar t o  the read operation 
when KO, and b i t  revert back to  2 volts and word is disabled. 
2.2 Another Core Organization - Column Activated Memory 
Core 
One phenomenon in the operation of SRAM is that  a11 cells in the selected row within a 
memory block are activated. Although only the selected columns in a block are multiplexed to  
sense amplifiers and 1 / 0  buffers, all columns are active, discharging the bit-lines and charging 
them back to  high subsequently. For instance, considering a 64-by-64 block of memory core 
in a SRAM chip with 8 1/0 pins, a read only reads out 8 bit-lines while the rest of the 56 
bit-lines also perform the  read operations. The energy involved in charging theise 56 bit-lines is 
actually wasted. If we only activate the selected columns, less energy will be dissipated in each 
operation. Motivated by this, we present another core organization which only enables the 
se1ect)ed columns and consumes significantly less energy under certain memor:y organizations. 
Figure 5 shows this new memory core organization. The unique modification from the core 
configuration of figure 4 is that  Vh; and F,,, now run vertically and are generated by column 
driver circuitry, which can be implemented analog to the row driver circuitry. The selection 
signals generated from the column address decoder enable the driver for the selected columns. 
Vhi and KO, for these columns are then cont,rolled by Ghi and Glow, respectively. Row driver 
is still required because Vword runs horizont,ally. The cell structure remains the same. The 
memory core starts  out in the rest s tate with all rows disabled. The column driver circuitry 
ensures that Vh; is at Sh; = 5 volts, View is a t  Slow = 2 volts and is pulled down to  
GND by the row driver circuitry. A read operation starts  with Vuord in the selected row being 
ramped np to 2.5 volts by Gword. Vhi and KO, in the selected columns are now ramped down 
to  3 volts and 0 volt,  respectively. The selected columns operate identically as described in 
section 2.1. Let us consider the unselected columns. The gate terminals of ac.cess transistors 
M5 and M 6  are tied t o  Vword and a t  2.5 volts during the read operation. However, Vhi and 
KO, of the cell are not ramping and stay a t  5 volts and 2 volts, respectively. Assume that 
internal node A is a t  LOW = Row and B a t  HIGH = Vh; (refer to  the cell shown in figure 5). 
Since Vword ramps up t o  2.5 volts only, neither of the access transistors M5 antd M6 is turned 
on. Thus, the pair of bit-lines of the cell stays a t  rest state 2 volts. We have obtained a 
memory core in which only the selected columns are active. Write operation is performed in 
a similar way such that only the writing columns are active. 
In this core organization, each active column is driven by its own driver, and hence only 
small size transmission gates are required t o  drive Vh; and KO, for each column in the column 
driver cil-cuitry. As a comparison, the previous core organization needs substantially larger 
transmission gates in the row driver circuitry because Vh; and KO, in one scalected row are 
driving all the bit-lines (remember that  every cell in the selected row is discharging and charging 
its bit-line). Cell size in this new core organization is slightly larger than thr: previous case. 
The  actual power advantage depends on the overall SRAM memory organizat~ion, which will 
be further discussed in section 2.4 
2.3 Energy Dissipation in Memory Core 
The choice of voltages which we have described earlier is governed by having a safe differential 
Vkeep (selected as 1 volt) across the cell t o  hold its state. Assuming that  Vdd is the supply 
voltage, 'vh; swings between Vdd and (Vdd + Vkeep)/2, and KO,, s w i n g  between (Vdd - Vkeep)/2 
and 0. Very high energy recovery can be achieved if ithe read process is gradual enough. This 
is because the charging and discharging pakhs for read operation are identical, and no cell 
flips its state. Hence, the energy dissipation is d ~ m i n a t ~ e d  by resistive loss, wlhich is modeled 
by equation (1). Threshold loss is negligible due t o  the fact that  Vword - KO, m Vt a t  the 
beginning of read operation. 
The l ~ r i t e  operation is not truly reversible. The bit of information stored in the cell is 
destroyed by switching the inverter loop through a voltage of Vkeep. Thus, th,ere is a certain 
amount of energy dissipated in the cell being written, which is inevitable because of the 
irreversible nature of erasing information. However, the capacitances of tht: long bit-lines 
are much more significant, than the capacitance of a single cell. Hence, the major portion 
of charging and discharging is performed adiabatically. The above analysis concludes that 
the energy consumption of memory core for both read and write operations can be greatly 
minimizeld if the processes are sufficiently slow. Resistive loss is the dominant factor of energy 
dissipation, which is inversely proportional to the signal transition time T. 
2.4 C:omparison of two Memory Core Organizations 
We now c:ompare energy dissipation of the two memory core organizations, which might have 
a major impact on the total power dissipation and chip architecture. If the signal transition 
time T aind the supply voltage Vdd are fixed and the same for both core organizations, the 
energy dissipation depends on the capacitances to be charged/discharged, which are modeled 
and compared below. The main difference between the two core organizations is that the 
second orie activates selected columns only, which usually make up a small portion of the 
total nrirrtber of columns in a memory core block. However, all cells on the selected columns 
parti~ipat~e in the voltage swings, while in the first core organization, only those cells in a single 
selected row are enabled. As a matter of fact, the total capa~it~ance associated with cells in 
one colunln is more significant than the peripheral capacitance of the bit-line, which implies 
that the actual capacitance being charged during an operation depends on the geometry of 
the core t~lock. Assume that both designs have the same recovery rate, and one column of the 
cells is associated with capacitance of CCea and the bit,-line has the peripheral capacitance of 
Chit. Further assume that a block of memory core  consist,^ of M-by-N bits, with n bits read or 
written in each operation. Then the ratio of effective capacitance of the two core organizations 
is given by: 
Ci:rt organization - NCbit + (N/M)Ccel/ - - N Cbit 
C2rrd organization n(Cbit + Ccell) (n) (cbit + cCe.) (2) 
Figure 16 compares the SPICE simrilation  result,^ of the two core organizations for a 64by-64 
block (with n = 8 1 / 0  pins). The column activated approach consumes slightly less energy 
than the regular row activated organization. However, as one can observe from equation 2, 
when the ratio of (M/n) is large, the second approach will be certainly superior to  the first 
one. e.g., in a 256-by-256 block, the row activated approach would consume arpproximately 4 
times of ,the energy of the column activated counterpart. Thus, our designs provide a potential 
space for architectural level optimization of the SRAM. 
3 A.diabatic Address Decoder 
Another major component of power dissipation in SRAMs is due to  the address decoder. In 
order to  reduce the total power consumption, it is desirable to  design the address decoder which 
can be operated adiabatically as well. Figure 8 shows two implementations of 12-to-4 adiabatic 
address decoder- NAND and NOR implementation. The only difference in the configuration 
of the acliabatic decoder from the ~onvent~ional dynamic decoder is that  the precharge p-mos 
transistors are replaced by n-mos transistors, which no longer function as precharge transistors. 
Let us first show how the NOR decoder can operate in the adiabatic fashion. The  decoder 
starts  out in the rest s tate with KO, a t  Vdd and all row lines, Wo, Wl, W2, W3, a t  Vdd - & 
(guaranteed by the transistors a t  the left-most column). After the address sign.als settle down, 
KO, gradually swings down from Vdd to  0, and all the row lines follow KO, down t o  0 except 
the selected row staying a t  Vdd - &. Note that the transistors in the left-rnost column are 
disabled in this discharging process since their gate terminals are also tied t o  biOw. During the 
period when KO, stays a t  0 volt, the row selection signals, Wo, Wl, WZ, W3, are valid and are 
sampled by the  row driver, which then enables the selected row in the memory core. Reverting 
back t o  the rest s tate is accomplished by ramping up KO, t o  Vdd, which pulls up all row lines 
t o  their rest s tate a t  Vdd - &. Subsequently all address signals return t o  zero. The  waveforms 
of the adiabatic NOR decoder are shown in figure 9. 
In a similar fashion, the adiabatic decoder can be implemented in a NAND array, which 
is shown. in figure 8(b). Again, the precharge transistor in each row of a clynamic NAND 
decoder is replaced by an n-mos transistor, which ensures that  every row line returns t o  HIGH 
a t  Vdd - & after a decoding operation. The operation process is similar t o  the NOR decoder. 
In the NAND decoder, however, only one row is selected each time. The selected row line 
follows l/rOw ramping down to  0 while all the other row lines stay a t  the rest s tate voltage 
of Vdd - &. Because of this, adiabatic NAND decoder consumes much less energy than the 
NOR decoder. Results from the SPICE simulat~ions of 6 - t e64  adiabatic NlOR and NAND 
decoders are detailed in section 6. In a NAND structure decoder, transistors in each row are 
serially connected, which makes it inappropriate to  be used in large decoders. Hence, the 
choice between the NOR and NAND structure depends on the size of decode]. and the speed 
required. 
Adiabatic decoders can also be realized in two stages with pre-decodingr, in which less 
number of transistors are used. However, two phases of decoding have to  be introduced as 
well, hence the total time required for address decoding is the twice of the transition time 
of Kc,,. By using the pre-decoding stage, the total effective capacitance of the  decoder is 
substantially smaller since less number of transistors are required. Moreover, the RC time 
constant in each stage is reduced, which results in higher percentage of energy recovery. Two 
level implementation is a better choice for adiabatic decoders of large size. 
Unlike the standard CMOS SRAM, where row decoder and column decoder are usually 
implemented in different styles, the same decoding scheme is adopted for both row and column 
decoders in our adiabatic SRAM design. It is also inineresting to  note that, the  geometric 
struct,urr: of address decoder is identical t,o the ROMs, differing only in the data  pattern. 
Thus, the same designs for address decoder can be applied t o  adiabatic ROMs. 
Design of Peripheral Circuits 
4.1 Adiabatic Level-Shifter and 1 / 0  Buffer 
T h e  pair of bit lines from the memory core has a voltage difference of 2 v o l t s  in our designs, 
which need t o  be amplified and buffered in order to  drive tohe 1 / 0  bus lines. In standard CMOS 
SRAMs, sense amplifiers (SA) are used t o  amplify the small voltage difference into full scale. 
The sense amplifier is usually clocked by an enable signal generated by the Address mansition 
Detection circuitry (ATD) to  reduce the energy loss due to  s h o d  circuit curreni.  We use a 
slight,ly different approach in order to  minimize the short circuit current and ,avoid the use of 
ATD circuitry, which consumes considerable amount of power [a]. 
The  ;approach we use is shown in figure 10, which consists of two stages, a voliage level- 
shifter and a bufler. The  level-shifter shown is e~sent~ially a p-mos cross-coupletl sense amplifier 
with two access transi~t~ors.  It functions as follows: initially 6, is a t  0 vol t  aind the  shifter is 
disabled. The  two int,ernal nodes A and 71 are also equalized. After the access transistors are 
turned o:n and bit and bit arrive, the voltage difference between two sides of the cross-coupled 
sense amplifier is built. Then we ramp K, gradually from 0 volts to  Vdd. Since the voltage 
difference has been built before the sense amplifier is enabled, short circuit current is not 
significant due t o  the positive feedback effect. As we can see from figure 11, node A is pulled 
up very rapidly and the stable state is achieved immediat<ely, hence the short circuit current is 
negligible. Subsequently, the two access transistors are turned OFF t o  isolate the level-shifter 
from the bit lines such that  A and 2 hold their states while the bit lines return to  the rest 
state. The buffer is now ready to  drive the 1 / 0  bus line. The transmission gate constructs 
of the buffer ensures that  the charging and discharging processes are performed adiabatically, 
and simulation results indicate that  more than 00% of energy recovery can br: achieved. Due 
t o  the thlreshold voltage of the transistors, A is not able t,o revert back to  0 volt (stops a t  & 
instead) when V, returns t o  0 volt.  Equalization is then applied t o  A and 3:. However, the 
energy loss in the level-shifter is not significant, since it only drives a buffer. The major portion 
of charging and discharging is performed adiabatically. 
4.2 ]Row Driver Circuitry 
We now consider another interface circuitry, the row driver, which uses the signals from the 
row address decoder t,o enable a particular row of the memory core. Each row of the row 
driver consists of two stages, a standard CMOS Ira-sia-le buffer and adiabatic transmission 
gate drivers, respectively. Let us consider t,he tri-st,at,e buffer first and assume that the NAND 
structurt: decoding scheme is used. The decoded signal W  for the selected row is LOW a t  0 volt 
when i t  is in the valid period. Then the enable signal W E N  switches from 0 t o  Vdd, which 
- 
enables the tri-state buffer and results in S E L  = Vdd and S E L  = 0 volt.  Unlike other signals 
which switch gradually during a transition, W E N  swit,ches in step function to  avoid the short 
circuit current in both NAND and NOR gates. When W  is still valid, W E N  switches back to 
0, which simultaneously turns off the p-mos and the n-mos transistors. Thus, S E L  and SEL 
are disconnected from their input and kept a t  Vdd and 0, respectively. The reason for using 
the stanldard CMOS tri-state buffer is as follows: Only one of the M rows is selected a t  each 
time, hence two rows have transition a t  most, i.e, the selected row in last cycle is unselected, 
and a new row is selected. Therefore, the rest of M - 2 rows do not have transition by using 
the tri-state buffer, and hence do not consume energy (ignoring the leakage current). 
The second stage of the row driver is three transmission gate drivers for Vhi, Kow and Vword, 
- 
respectively. For the  particular row which is selected, S E L  = Vdd and S E L  =: 0. Hence, Vhi, 
KO,, ancl Vword are controlled by Ghi, Glow, and Gword, respectively. For all otlher M - 1 rows, 
Vhi, Kow, and Vword are connected to  the DC supplies Shi, Slow, and GND, respectively. 
At this point, we have discussed our adiabatic SRAM from address input to data output. 
T h e  overall clocking scheme for the SRAM is shown in figure 12. 
5 Optimal Voltage Selection 
In conventional CMOS digital systems, energy consumpt,ion decreases in proportion t o  the 
square of'the supply voltage Vdd. This is NOT the case for adiabatic circuits in general. Instead, 
there mi,ght exist an  optimal voltage swing which leads to the minimal energy dissipation in 
adiabatic circuits with certain constructs [9]. In this section, we re-examine the optimal voltage 
swing pr'oblem and derive a method t o  find the optimal voltage for the adiab.atic SRAMs, as 
well as for general adiabatic circuits. 
Let us consider charging a load capacitance of CL through a MOSFET t o  (deliver a charge 
of CLVdd over a time period T. The energy di~sipat~ion through the channel of the MOSFET 
is given by: 
Where g is the MOSFET channel conductance, A V  is the voltage drop across ,the channel and 
<.> denmotes the average over time period T. On the other hand, 
Hence, 
We use the  following approximation to  simplify Edias, 
Substitute equation 5 and 6 into equation 3, 
The approximation of equation 6 turns out to  be exact if AV is a constant throughout the 
charging process. More detailed analysis and sim~llations uggest only a small variance of AV, 
which promises the accuracy of the approximation. 
We now consider a transmission gate shown in figure 13(a), which is a fund.amenta1 circuit 
construct t o  many adiabatic approaches, to drive a capacitive load CL.  The conductance of 
the n-MC)S channel is 
where kn is constant, (pn~/to,)(W/L).  The average conductance is then given by: 
The  waveform has been assumed t o  be switching linearly so that  the average over time can be 
obtained by integrating over voltage. Similarly, 
Thus, we have 
and hence, the energy dissipation is: 
The energy dissipation ET-g,te has a minimum a t  Vdd = 3 h .  When the difference in threshold 
voltage hetween n-MOS and p-MOS is accollnted for, the optimal Vdd lies between 3Kn and 
3&,. Although the second order effects (e.g. t,he body effect,) have not been taken into 
account, SPICE simulat,ions verify that  3l.4 is close to  the minim~lrn energy dissipation. In our 
SRAM d'esigns, transmission gate drivers have been used in 1 / 0  buffers, the supply drivers for 
Vh; and T i o ,  of the memory core, the word-line drivers and the address signal drivers. 
Let us now consider another major component of energy dissipation-the bit lines in the 
memory core. Circuit model of discharging a bit-line capacitance is shown in figure 13(b). The 
cond11cta.nce of transistor M 1  and M5 are 
where: V,I R KO,. Set Vkeep = Vt. We have Sword - Vkeep = Sword - & = A;low- The serial 
connection of M 1  and M5 gives: 
Hence, 
g M l g M 5  
9 = 
g M 1 - k  g M 5  
- k n ( s w o r d  - &)(s low - K O w )  
Sword + Slow - & - K o w  
= kn[(Sword - Vt) - (Sword - &I2 
Sword + Slow - Vt - Koru I - 
kn sl- (Sword - &)? 
= k n ( s w o r d  - Vt) - - J d K o w  
s l o w  0 Sword + Slow - & - K o w  
(Sword - &)? ln( Sword + Slotu - = kn(Sword - &) - k n  
s l o w  Sword - & (15) 
On the other hand, We have Sword = (Vdd+Vkeep)/2 = (Vdd+&)/2 and Slow = (lVdd-Vkeep)/:! = 
(Vdd - &)/2. Substituting Slow and SIuord into the above equation gives, 
Thus, we: obtain the energy dissipation on a bit,-line t o  be 
which is proportional t o  Vdd - Vt. Hence, the smaller the voltage swing, the smaller the energy 
dissipation. However, there are other const,raint,s on the voltage swing. With the level-shifter 
we designed, Slow > & must hold to  make it work properly, which results in Vdd > 3l4. 
Therefore, for our design of the SRAM, the optimal voltage swing lies between 314 and 4l4. 
The more accurate value can be determined by simulations and also depends on the particular 
design oi' the level shifter (remember that there are plenty of choices for the level-shifter). 
6 Implement at ion and Results 
We implemented the circuits using MOSIS 1.2pm CMOS process. The layout of the cell of the 
two core organizations (referred to  as design 1 and design 2) is shown in figure 1!4. The cell size 
of the two designs is 27Ax40A and 32Ax38A, respectively, where A = 0.6pm. lJnlike standard 
RAM cells, a memory cell working in an adiabatic fashion does not require Ithe high aspect 
ratio bet3ween the pass and the pull down transistor. The ratio of transistors h13 and M5 (M4 
and M6) is unity in our designs. Although the topology of the cell in design 1 is identical to  a 
standard 6-transistor RAM cell, the fact that separate Vh; and K O ,  lines are needed for each 
row resu1.t~ in additional area overhead. For comparison, a conventional cell takes 27Ax36.6A 
area. The cell of design 2 takes the largest area due to the vertically laying  of Vh; and K O ,  
lines. 
Spice simulations were carried out on extracted sub-circuits for a group of four SRAM cells 
for both core organizations. Full parasitic extraction was performed. We sinnulated a block 
of SRAM core of 64 rows by 64 columns by accounting for various capacitive loads on lines. 
Simulations using level three SPICE models from a recent MOSIS run verified the operations 
of the memory core. Waveforms from simulat,ions for both core organization are similar and 
result,s far the first core organization are shown in figure 7. The first operation was a read 
followed by a wri le .  Figure 15 shows the fraction of energy recovered at vizrious speed of 
operation. For the stimulus with transition time of Ions, energy recovery was .around 50% for 
both reat] and write operations. 
We also simulated the extracted circuits of 6-to-64 NOR and NAND decoders implemented 
in the same technology, which is shown in figure 17. Results indicate that app~:oximately 90% 
of energy. can be recovered for both organizations when the stimulus has a transition time of 
10ns. However, as figure 18 suggested, the energy dissipation of the NAND-a,rray decoder is 
far less than NOR-array decoder. The reason is due to  the fact that only a single row is pulled 
down in the NAND decoder while N - 1 rows are pulled down in the NOR decoder, which has 
been discussed in section 3. 
Although level-shifter with high energy recovery is difficult to design, it should be noted 
that it only drives a buffer. The energy loss at voltage-level-shift stage is insignificant. The 
110 buffer, which has heavy capacitive load, can perform in adiabatic fashion ,and can recover 
most of the energy by using the transmission gate construct. Our simulation results indicate 
that more than 90% of energy recovery can be achieved with 1pF capacitive load at lOns of 
stimulus transition time. 
7 Summary 
In this pitper, we have demonstrated a novel application of the principle of adiarbatic switching 
to the design of Siaiic RAM, which can be implemented without significantly increasing area 
or circuit, complexity. Our design also provides the fle~ibilit~y for possible optimizations of the 
SRAM from overall archite~t~ural considerations. Results indicat,e that the essential advantage 
of adiabatic logic, that of low-power, is achievable in SRAM. 
Certain features of our adiabatic SRAM designs are summarized as follows: 
Very small standby current since 6-transistor cell organization is used. 
Bit-line peripheral circuitry has been eliminated except for the equalization transistor. 
There is no need for pre-charge since bit-lines revert to  their rest state voltage level a t  
the end of each operation. 
Tht: sense amplifier has been replaced by the adiahatic voltage level shifter and hence, 
short circuit current has been minimized. 
Address transition detection circuitry (ATD) is not adopted, which is used to  reduce 
the short circuit current of the sense amplifiers in the standard CMOS S,RAM. However, 
ATD circuitry itself consumes considerable amount of energy[S]. Fully glalbal controls are 
required in adiabatic designs for energy recovery, and hence, self-timed approaches are 
not used in our design. The overall supply schemes for the SRAM is sho.wn in figure 12. 
Cont~:ol circuits are not addressed in this paper because they are not among the main 
components of energy dissipation in SRAMs and can be implemented in stantlard CMOS. 
In this paper the issue of the design of power supplies required by the SRAM has not been 
addressed. The possibility of the above designs t o  be used in actual practice, hinges largely 
on the ability to  ensure proper supplies to  the SRAM. At this moment, on-chip methods for 
generating those supply waveforms are not present. It  is however feasible t o  generate off-chip 
supplies[l, 3, 51 to  make adiabatic operation of memories a reality. 
References 
[I] W.C.Athas, L. "J" .Svensson, J.G.Koller, N.Tzartzanis, and E.Chou, " A Framework for 
Practical Low-Power Digital CMOS Systems Using Adiabatic-Switching P'rinciplea," Intl. 
Wor-kshop on Low Power Design,Napa Valley, California, 19'34, pp. 189-194. 
[2] J.S.-Denker, S.C.Avery, A.G.Dickinson, A.Kramer, and T.R. Wik, "Adiabiatic Computing 
with the 2N-2N2D logic Family," Inil. Workshop on Lorr) Power Desigia, Napa Valley, 
California, 1994, pp. 183-187. 
[3] S.G.Yoonis and T.F.Knight, "Asymptot,ically Zero Energy SpIit-Level Charge Recovery 
Log:ic," Inil. Workshop on Low Power Design, Napa Valley,California, 1994, pp. 177-182. 
[4] A.P. Chandrakashan, S. Sheng, and R. Brodersen, "Low Power CMOS Digital Design," 
IEE'E Journal of Solid-Side Circuits., vol. 27, No. 4, Apr. 1992, pp. 473-483. 
[5] A.G. Dickinson and J.S.Denker, "Adiabat,ic Dynamic Logic," IEEE Journal of Solid-Siaie 
Cin:uiis, Vo1.30, No.3, March 1995, pp.311-315. 
[6] Y. 'lfe and K. Roy, "Energy Recovery Circuits Using Reversible and Partially Reversible 
Logic," IEEE Trans. on Circuits and Sysiems I, t o  appear. 
[7] L.A. Glasser and D.W.Dobberpuh1, The design and Anc~lysis of VLSI Clircuiis, pp.390- 
393. 
[8] S.T.. Flannagan, P.H. Pelley, N. Herr, B.E. Engles, T. Feng, S.G. Nogle, J.W. Eagan, R.J. 
Dunnigan, L.J. Day and R.I. Kung, "8-ns CMOS 64K x 4 and 256K x 1 SRAM's," IEEE 
,Tournal of Solid-Siaie Circviis, Vo1.25, No.5, Oct. 1'3'35, pp.1049-1056. 
[9] W.C.Athas, L."J".Svensson, J.G.Koller, N.Tzartzanis, and E.Chou, "Low-Power Digital 
Sysliems Based on Adiabatic-Switching Principles," IEEE Trans. on VLSI Systems, Vo1.2, 
No.4, Dec. 1994, pp.398-407. 




Figure 1: Charing and discharging in standard CMOS 
DC power su.pply 
.. ............................................................................. 
1 
and adiabatic fashion 
Adiabatic logic circuits & memory 
AC power supply/clock 
waveform generator 




Memory Cell Array 
3 .  3 * d o a  2 2 
- - 
a m . .  
a1 a2 a * *  an 




Figure 3: SRAM organization 
DATA 
I/O buffer 
- Voltage Level Shifter 
G word I I - - - - - - - - - - - - - - - .  A -




I M-l rows 
I I 
C S E L ~ ~  >b-dc >C CSEL 
I/O LINE W  
Figure 4: Memory core organization 
G word 
- N-1 columns - 
Vword 
I I I I 
I I I I 
I I I I 
t 
I I I I M-1 rows 
I I I I 
I I I I 
I I I I 4 
I I 
v low vhi 
Figure 5: Another memory core organization. I/;,, and q,,, run vertically a:nd are generated by 





G w m d  G l o w  G h i  
Figure 6: The row driver circuit 
150 200 250 300 
Time (ns) - read - - write - 
Figure 7: Waveforms for the SRAM core 
'iw 'iw L V" 
A0 KO A1 XI 
(a) 2-to-4 adiabatic NOR decoder 
A0 KO A1 K1 
(b) 240-4 adiabatic NAND decoder 
Figure 8: Configurations of adiabatic address decoder 
T I i C  U L l l U R A T l C  6 - 1 0 .  6 4  A O I I R F S S  U E C O O C R  
9 5 / 1 0 / 0 9  % 3 : 0 6  2 6  
Volt 
Time 
Figure 9: Waveforms of adiabatic address decoder 
- 
VO (from memory core) VO 
Figure 10: Adiabatic level-shifter and I/O buffer 
TI;E f i D I f i B A T l C  5!4X6'4 S R n *  C 3 R E  N I T ) - I  ~ T i i C l . .  - .S&lfFT. ; l i  flkn :/C ~ I J F ~ E R  
9 5 /  1 . 2 / ? 5  1 5 : 5 5 :  1 q 
Volt 
T l H E  






k. ............ s 
i access time : 
* ........................ * 
i cycle time : 
Figure 12: Overall supply schemes for the adiabatic SRAkI 
d 7 -  
bit 
Figure 13: Circuit models of adiabatic charging/dischargin;g 
b i t  v l o  vhi  b i t b  
Figure 14: Layout of two SRAM cell configurations. The cell shown on thc: left side is for row 
activated Inelnory core organization, and the cell shown on the right side is for column activated 
memory core organization, in which only the selected columns are active. The supply lines Vhi and 
xo, run hori;:ontally in the left cell and vertically in the right cell. 
scad in design 1 - - -- 
write in dcsinn I--- 
10 20 30 40 50 60 7080 1100 
Transition Time (ns) 
Figure 15: Energy recovery in adiabatic memory core 
Figure 16: Comparison of energy dissipation in two memory core organizations 
1 
Transition Time (ns) 
Figure 17: Energy recovery in 6-to-64 adiabatic NAND and NOR tiecoders 
180 
160 




NAND decoder --- 
a 
0 
'- 120 0 











0.1 1 10 
Transition Time (ns) 
Figure 18: Comparison of energy dissipation in adiabatic NAND and I'JOR decoder 
