Abstract -We describe the design of an nRERL microprocessor for ultra-low-energy applications. nRERL (nMOS Reversible Energy Recovery Logic) is a new reversible adiabatic logic circuit using only nMOS transistors, which can be operated at the leakage-current level [1]. We focus on two main issues; first, the design of a full adiabatic microprocessor, which uses only adiabatic components for all the functional blocks, second, the energy consumption of the nRERL microprocessor including its clocked power generator (CPG).
I. INTRODUCTION
The main objective of designing an adiabatic system such as nRERL microprocessor is designing a system that consumes energy as little as possible. There have been only a few papers on adiabatic microprocessor. Vieri's 32-bit reversible computer in [2] is a good work, but he did focus on reversible computation itself and its software level. Athas and Tzartznis's works in [3] are the only adiabatic microprocessors ever reported, as we know, which recycled the energy of the nodes with large load capacitance by using clocked buffers, however they used the conventional circuits for all the other components.
This paper focuses the design of a complex full adiabatic microprocessor. In the nRERL microprocessor, all the functional blocks are implemented with nRERL except its CPG blocks. The proposed nRERL microprocessor is optimized in view of the energy consumption. First, we minimized the number of nRERL buffers using phase-based pipelining. Second, we break the logic reversibility with the Self Energy Recovery Circuits (SERCs) [1] . Third, the more energy-efficient CPG using two-step compensation is used.
In this paper, the architecture of microprocessor is shown in Section II, which is followed by the energy consumption analysis in Section III. And experimental results are shown in section IV, which is followed by the Conclusion in Section V.
II. ARCHITECTURE
The nRERL reduced its energy consumption and area overhead substantially because the nRERL uses nMOS transistors only by exploiting the bootstrapped nMOS switches. The detailed description about nRERL can be found in [1] .
The nRERL microprocessor is based on the instruction set architecture of DLX. The instruction set architecture of DLX is slightly modified to apply it to the nRERL microprocessor without losing its features at all. An instruction size is 20-bits and data-path size is 8-bits. Total 19 instructions are implemented: add, sub, and, or, xor, slt, addi, lw, sw, jr, jalr, sp, beqz, bnez, j, jal, nop/ref (refresh), lwp and swp.
In Fig. 1 , the schematic of micro-architecture of the 8-bit nRERL microprocessor is shown together with the pipelining diagram. Note that a clock cycle consists of 6 phases in nRERL. So many nRERL buffers are used for retaining node information due to the cycle-based pipelining. Therefore we exploit the phase-based pipelining, which is an inherent property of adiabatic switching. Therefore, the number of buffers is reduced. Besides, we designed the system with no phase margin to minimize to minimize the number of buffers. The time to complete one instruction is 28 phases, which is minimized to reduce the number of nRERL buffer and the energy consumption. Therefore, each DLX instruction can be implemented in at most five clock cycles.
We used only adiabatic components for all the functional blocks of the nRERL microprocessor. However, we break the logic reversibility with SERCs if we can save the area or reduce the energy dissipation. We used SERC to recover the energy stored in a node with CV th 2 energy loss, when the energy loss of additional buffers for reversibility is large. The SERC can be used for unwrite operation with different clock scheme in nRERL memories.
nRERL memories employ the concept of reversible storage. Instead of moving the stored data to another storage cell, we used SERCs so that we can recover almost of the stored energy. The SERCs in a register file or a RAM unwrite the stored data to make the storage cell clear in the unwrite operation, which proceeds the write operation. The detailed description about the design of nRERL memories can be found in [4] .
The 6-phase CPG using 2-step compensation is proposed. By using a capacitance compensation circuit, the static capacitance of each rail is equalized. And by using a frequency tracking circuit, the effects of dynamic capacitance variation are compensated. The frequency tracking circuit helps the transition of the heavily loaded nodes. It supplies more charge to the slowly rising nodes or drains the stored charge from slowly falling nodes. The block diagram of the clocked power generator is shown in Fig. 2 . Note that the frequency tracking circuit is connected to both terminals of inductor and static compensation blocks are connected to each clock rail. The detailed description about the basic operation of CPG can be found in [5] .
III. ENERGY CONSUMPTION ANALYSIS
The CPG is the most energy-dissipative part in nRERL microprocessor and the nRERL function blocks can be considered as an equivalent load.
Therefore, the total energy loss can be expressed as shown in (1).
,where E adiabatic is the adiabatic energy loss in rail driver switch and load, which is inversely proportional to the rail driver switch size. E constant is the nonadiabatic energy loss consumed in CPG control block and in SERCs, which is constant. E swithcing is the nonadiabatic energy loss for driving the gate capacitance of the large rail driver switch, which is proportional to the rail driver switch size. E leakage is the energy loss due to the leakage current of MOSFET and E mismatch , E compensation are the energy losses due to the capacitance mismatch and compensation in CPG, respectively. The total energy is the function of load capacitances, transistor sizes and operating frequencies, etc. We can induce the energy-optimal frequency is about 2MHz by using (1). The energy-optimal sizes of a rail driver, which are one of the most important design factor, were chosen 276 µm / 0.24 µm for pMOS and 138 µm / 0.24 µm for nMOS from the simulation.
Sometimes 'E mismatch + E compensation ' is much larger than the other components. Using two-step compensation, the energy loss of CPG due to capacitance mismatch is successfully reduced as shown in Fig. 3 .
IV. EXPERIMENTAL RESULTS
The chip is fabricated in Anam 0.25-µm 5-metal n-well CMOS process: V dd = 2.5 V, V tho = 0.3 V and V thb = 0.5 V. Fig. 4 shows the microphotograph of the chip. The area of core size is 2.62 mm x 2.03 mm. The number of total transistors is 98,000. The total energy consumption of the nRERL microprocessor was 26.22 pJ at V dd = 2.5 V and f = 440 kHz, which is almost the same as the simulation result. Though the maximum operating frequency is about 10 MHz, the energy-optimal frequency is 440 kHz in measurement.
The estimated energy consumption of each component in the nRERL microprocessor and CMOS one is summarized in Table I . To allow fair comparison, the energy consumption of the sense amplifier and bit-line pullup circuit in the register file and RAM is excluded, which is important issue in the fast and high-performance microprocessor and it takes up 97 % of the total energy dissipation. It is estimated with SPICE simulation at V dd = 2.5 V, f = 2 MHz, which is the energy-optimal operating frequency of nRERL in simulation. At that frequency the leakage losses and the adiabatic losses are about equal [1] . As can be seen from the table, the main portion of the energy consumption is in the clock driver.
The total energy consumption of the nRERL microprocessor was 13.6 % of the static CMOS one, which was about half of the Athas's approach. Note that the Athas's microprocessor [3] consumed about 30 % of the energy consumption of the static CMOS one.
V. CONCLUSIONS
We described the design of an nRERL microprocessor to verify the usefulness and the energy-efficiency of nRERL. The nRERL microprocessor is a full adiabatic microprocessor, which uses only adiabatic components for all the function. However the energy consumption was not reduced so much as the previous work [1, 4, 5] . The energy consumption of the nRERL microprocessor was 26.22pJ at V dd = 2.5 V and f = 440 kHz, which was about 13.6 % of the energy consumed in the conventional CMOS one from the simulation comparison. This result is encouraging though the direct comparison is too rough.
In future, more considerations on low-power techniques such as dynamic voltage scaling and power down should be given. 
