In this paper, we propose an adiabatic register file for ultra-lowenergy applications, which uses a new reversible adiabatic logic, nRERL [ 13. The nRERL register file discards garbage information with minimal energy dissipation. We designed a 16x8b three-port nRERL register file. From SPICE simulations, we found that the nRERL register file consumes less than 10% of the energy consumed in the conventional register file at the frequency of lower than 1MHz. We also describe how to design a RAM, a large array of the storage cells.
INTRODUCTION
The adiabatic circuits are useful for ultra-low-energy applications at low operating frequencies because they consume less energy as their operating speed decreases [2] . Recently we proposed nMOS reversible energy recovery logic (nRERL) [ 11. Because it exploits the bootstrapped switches and nMOS transistors only, its circuit complexity and energy consumption are reduced substantially, compared to other fully adiabatic logic. There have been several papers on the adiabatic memories for low-energy applications [3] [4] [5] [6] , which were based upon the conventional SRAM cell. However they have limitations in reducing energy consumption at lower-speed operation because they have large non-adiabatic loss, which does not depend on the operating frequency. In contrast, a reversible memory uses a swap operation instead of erasing the bit stored during read or write operations [7] [8] because erasing a bit accompanies non-adiabatic energy dissipation. Recently, a reversible memory was proposed in [9] . However, this fully reversible memory has limitation in Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission andor a fee. ISLPED '00, Rapallo, Italy.
Copyright 2000 ACM 1-581 13-190-9/00/0007 ... $5.00. implementation due to the garbage information. When a new data is written to a memory cell, the old data becomes garbage, which should be stored somewhere to avoid non-adiabatic loss due to erasing. Therefore, a huge garbage stack is required. Although a small array of reversible memory was implemented in [9] , a large array of fully reversible memory is still not realizable because of the large overhead of garbage stack. This paper describes an nRERL register file, which employs the reversible adiabatic logic, nRERL, and discards garbage information with minimal energy dissipation. A new nRERL storage cell and other blocks are designed. Also, the method to design a RAM, a large array of the storage cells is described.
nRERL STORAGE CELL 2.1 nRERL
The nRERL [ 11, which uses a simpler 6-phase clocked power, is an improved version of previous RERL circuits
It is simpler and its energy consumption and area overhead are reduced substantially because the nRERL uses nMOS transistors only by exploiting the bootstrapped nMOS switches. An nRERL buffer is shown in Fig. 1 as well as its 6-phase clock. The detailed description about nRERL and its clocked power generator can be found in [ 11. 
Storage Cell
The storage cell shown in Fig. 2 is used in the three-port (2-read 1-write) nRERL register file, which consists of 14 transistors. Basically it uses the same clocking scheme as the nRERL buffer in Fig. 1 . The transistors, M1 and M2 control the write path fiom the write bit-lines to the storage nodes Xo and XO , respectively, and M3, M4, M5 and M6 control the read path for read port0. Similarly, M3, M4, M7, and M8 control the read path for read portl. The transistors of M9, M10, M11 and M12 comprise an Self-Energy-Recovery Circuit (SERC) [ 101 which recovers the energy of Xo and Xo in the unwrite operation and supplies the energy to in the refkesh operation. Note that the SERC in [ 101 was designed with transmission gates. The clamp transistors, M13 and M14, make the undriven storage node stay at ground. The gate capacitances of M3, M4, M9, M10, M13 and M14 are the storage capacitance of the memory cell. All transistors in the storage cell are 0.36prn/0.24pm, which is minimum-sized, except for M3 and M4, which size are 0.92pd0.24pm.
--

Clock Gating Method for a Single-Rail Signal
A clock gating signal should nest the clock signal not to have non-adiabatic loss. When a gating signal is enabled, it must stay high at least for three phases, as shown in Fig. 3 , so that this signal can nest the clocked power. This clock gating method for a single-rail signal is used to activate only a selected part of a memory.
Control Signals and Operations
The control signals in the storage cell in Fig Inside the cell.
Architecture
The block diagram of an nRERL register file is shown in Fig. 6 . In a read operation, "read" signal is evaluated by Qo, and its output data is evaluated by +4, and its latency of the read operation is 4
phases. In a write operation, "write" signal is evaluated by Q2, its input data is written to the cell when Q2 is rising in the next cycle, and its latency of the write operation is 6 phases because of an unwrite operation that is performed before a write operation. For proper memory operation, the selection signals for each word-line in Fig. 4 . must be generated separately. Fig. 7 shows the write address decoder, which is an N-t0-2~ demultiplexer, where N is the bit-width of the address. To reduce the energy dissipation, the decoder is enabled only when its control is high. An additional buffer chain, which generates several delayed addresses, is necessary to recover the energy of the decoded signals. The energy of address is recovered at the end of its buffer chain with the SERC's after reducing the node capacitance of the delayed address. The read address decoder is similar to the write one. The data in, data out and controller are simple, which mostly consists of buffers. Except for the nodes connected to the SERC's, the energy of any node is recycled without non-adiabatic loss in the nRERL register file. Especially the energy of the bit-and word-lines with large capacitance is recovered without any nonadiabatic loss.
address. The operation sequence in the timing diagram is as follows: refkesh the data, read it, write a new data, and read it again.
Hierarchical Design for a RAM Expansion
Generally, a RAM is a large array of the storage cells. In expanding an nRERL register file to a RAM, we have problems: the overhead of the large address decoder and the energy overhead due to the bit-lines with large capacitance. In this section, we propose to use a hierarchical design to solve these problems.
Two-Level Address Decoding
The area overhead of an address decoder is large when the bitwidth of the address is large. This decoder overhead can be reduced substantially by using two-level decoding, because it can reduce the number of the buffers exponentially. In the two-level address decoding, the two decoded signals, wl-high and wl-low, f?om the two separated decoders can be combined, as shown in Fig. 9 . If we divide the address decoder into more than two, the complexity of the decoder is further decreased with the additional cascaded gating circuit. 
Bit Line Separation Using a Conditional Reversible Mux
We also need to reduce the large capacitance of a bit line. We propose to divide a bit line into several lines and combine them with a conditional reversible m u . Generally, a mux is not reversible because it is not a one-to-one. However, it can be reversible if a condition is satisfied. For example, a conditional reversible 2-to-1 mux is shown in Fig. 10 . This 2-to-1 mux requires a delayed copy S* to make it reversible. If S is high, A is selected and its energy is recovered by &, and B must be in the clear state. If S is low, B is selected and its energy is recovered by (I*, and A must be in the clear state. The conditional reversible mux can be used to reduce the energy dissipation in a bit line as shown in Fig. 11 . Each bit line is divided into 4, respectively. The 1-to-4 demux, which is the reverse of the 4-to-1 mux, selects only one out of four write bit lines and puts the others in the clear state.
The 4-to-1 mux selects only one out of four read bit lines. This method saves the adiabatic loss substantially because the adiabatic loss of a node is proportional to the square of its capacitance.
Simulation Results
A 3-port 16x8b nRERL register file was designed, which used Anam 0.25-bm n-well 5-metal process. The area of a proposed register cell with 14 transistors was less than two times that of the conventional 3-port SRAM cell with 10 transistors. Each of the three address decoders was divided into two with two-level address decoding: one for the higher 2 bits and one for the lower 2 bits. In addition, each read and write bit line is divided into two. The layout of the nRERL register file is shown in Fig. 12 , which area is 500.1 pm x 33 1.8 Fm. Its energy consumption per cycle is shown in Fig. 13 , which was estimated with SPICE simulation in the condition, where two read and one write operations were performed simultaneously and the supply voltage is 2.5V. Its energy curve was similar to those of the nRERL logic circuits [l]. The energy consumption of the nRERL register file was separated into 7 components: two write address decoders, four read address decoders, controller, data-in, two write bit lines (include cell), four read bit lines, and data-out, which are shown in Fig. 13 . The energy consumed in the decoders was the largest in the register file, even though it was designed by 2-level address decoding. Because the word line selection signals in each decoder are maintained for 4 phases and because the number of the decoders is six for 2R1 W, the energy dissipation in the decoders is large. A three-port static CMOS register file was designed with a conventional 10-T SRAM cell and its power dissipation was compared with that of the nRERL register file as shown in Fig. 14.
In the CMOS register file, we did not include the power consumption of the sense amplifiers and bit-line pull-up circuits because they are not required in the low-speed operation [6] . The results show that the nRERL register file consumes 6.6% of the power dissipated in the CMOS one at lMHz, the energy-minimal frequency in Fig. 13 . [6] . The adiabatic memories in [3] , [4] and [SI are a hybrid design with the conventional CMOS SRAM cell. The adiabatic register file in [6] and our nRERL register file exploit fully adiabatic circuits. 
Conclusions
We proposed an nRERL register file, which employs the reversible adiabatic logic, nRERL,, and discards garbage information with minimal energy dissipation. A clock gating method for a single-rail signal was used to reduce the energy dissipation. We also recycle the energy of bit-line. From SPICE simulation, we found that an nRERL register file had substantial advantages in energy consumption at low operating frequencies. We also proposed to use a hierarchical design for an nRERL
