ABSTRACT : Charge recycling has been proposed as a strategy to reduce the power dissipation in data buses. Previous work in this area was based on simplified bus models that ignored the coupling between the lines. Here we propose a new Charge Recycling Technique (CRT) appropriate for sub-micron technologies. CRT is analyzed mathematically using a bus energy model that captures the energy loss due to strong line to line capacitive coupling. In theory CRT can result to energy reduction of a factor of 2. It becomes even more energy efficient when combined with Bus Invert coding (Stan '97, [6]). A circuit has been designed and simulated with all parasitic elements extracted from the layout. Taking into account the circuit energy overhead the net result in energy saving can be up to 32 % .
INTRODUCTION
Over the past several years, significant emphasis has been placed on reducing the energy dissipation associated with on chip communication. Numerous schemes have been presented for reducing energy associated with driving wires including low swing signaling [ 1], [2] , [3] , charge re-cycling [4] , [5] and data coding [6] , [7] , [8] . In this paper we introduce a new practical Charge Recycling Technique (CRT) appropriate for sub-micron technology buses. Its performance is verified by both mathematical analysis and circuit implementation. The technique, based on charge recycling between the lines, consists of two steps. During the first step, charge redistribution takes place between the lines whose logical values are changing during the transition. All other lines remain connected to their drivers. During the second step, all lines are driven to the voltages corresponding to their new logical values. A similar technique was presented in [4] and [5] but for the case where there is no coupling between the lines. This difference is very essential. In sub-micron technology the strong capacitive coupling between the lines must be taken into account since it dramatically changes the energy consumption during bus transitions (with or without the CRT). For this purpose we use a sub-micron bus energy equivalent model presented in [7] . In this paper we also present a driver to implement CRT. Its operation is demonstrated on a 4-line and an N i n e bus using HSPICE. The driver works at lOOMHz and results to a net energy saving of up to 32%. The circuit is directly expandable to larger buses. Large buses of 32 or 64 lines and FPGA interconnect networks are expected to have higher net energy savings with CRT (Rabaey [ll] ). Standard .18p CMOS technology has been used. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 
ISLPED'OI,

SUB-MICRON Bus MODEL
The sub-micron bus energy model we use to evaluate the different energies is shown in Figure 1 . It has been proven in [7] that this model has identical energy behavior to its distributed version. C , is the capacitance between each line and the ground and C, is the capacitance between adjacent lines. (The capacitance between non-adjacent lines is very weak and can be ignored). We define the technology dependent parameter h = C,/CL. For . 1 8~ technology, h is about 5. Also, h tends to increase with technology scaling.
To simplify the theoretical analysis we set V,, = 1 . Then all energies calculated under this assumption must be multiplied by the factor 6, to give the real energy value.
Bus DRIVER MODEL
P
The drivers are modeled as in Figure 2 , [9] . The resistors Ri ( t ) and $(t) correspond to the PMOS and NMOS transi,, ctors of the drivers. Their values can be almost arbitrary functions of time.
The switches si and < are complementary and their status corresponds to the desirable values of the lines. The parasitic capacitances of the drivers outputs can be lumped into C,.
CHARGE RECYCLING
In this section we present the two steps of CRT. Suppose the bus has n lines and let T be the clock cycle period. New data is transi-th driver : ' , , Step
Step x,, x,, ..., x,,] , to its new values,
.c, 
V( r ) = y respectively. The CRT is presented in Figure 4 with the modified driving circuit. We agree that switch w i has value 0 if node i is connected to the output of driver i, and value 1 if node i is connected to the common node q. During Int, the lines that 
First Step (Int 1)
For every line i = 1, ..., n we set di = xi O y , . We also use the Here we assume that the time length T / 2 is sufficient for the voltages of the network to settle. This assumption is reasonable for the current technology and is always used in charge redistribution (and adiabatic) techniques. So for i = 1, . . . , n the voltage
is of course the same for all lines that change logical value. Algebraically we have that Vi($)
where I is the n x n identity matrix. The matrix D and the vector d are as defined before. From (6), (7) and V(0) = x we have, 
Energy Dissipation on Step 1
Here we evaluate the energy that is drawn from V,, on the first step of CRT. The current I , (t) drawn from Vdd during (Int,) is the sum of the currents drawn by the lines that do not change logical value during the transition and remain connected to V,, through their drivers, i.e the lines i = 1, ..., n for which so, 
IVdd(t) = X T . ( I -D )
(Symbol I is used for both the current vector and the identity matrix. Is should be clear what I represents each time). Using equation (4) and (10) we have that,
Because of the normalization V,, = 1 the energy drawn from Vdd during step 1 is E, = 1 Ivdd(t)dt. By replacing (11) ( 0 ) ) . Finally we use (7) to get,
And by replacing z from (9) into (12) we have,
Second Step (Int 2)
During the second step of the CRT, the time interval 
Energy Dissipation on Step 2
During Int, the current drawn from V,, equals the sum of the currents I i ( t ) of the lines connected to Vdd (through their drivers). So I , ( t ) = n C Ii(t)= C y i I,(t) or in vector form, 
Finally, V( T ) = y , (7) and (9) 
ENERGY PROPERTIES OF CRT
The total energy E(x, y ) drawn from Vdd during the transition x -+ y is of course E(x, y ) = E , ( x , y ) + E2(x, y ) . Using the identity ( I -D ) . x = ( I -D ) y and expressions (1 3) and (1 7) we get,
E ( x , y ) = y T .CT.(y-x)+yT.D.CT.(D.x-z.d) (18)
where z is given by (9). The first term of the right part of (18) 
ENERGY REDUCTION
The result for the transition energy, formula (l8), allows us to estimate numerically the expected energy drawn by the bus when the CRT is used. We do this for the case of uniformly distributed i.i.d. data. In Figure 7 we see the expected energy using CRT as a percentage of the expected energy without CRT for the cases of n = 2, 4, 8, 16, 32, 64, 128, 256 and h = 0, 5, 10. The figure suggests that for the number of lines n = 32, 64, 128,256 the energy drawn from Vdd can be reduced to one half using CRT. Also, the results are independent of the capacitance to ground C , and they slightly improve when h increases. In general h tends to increase with technology scaling.
CRT AND BUS-INVERT
In the previous sections we showed how CRT reduces energy consumption. In Figure 8 we present an architecture where CRT is combined with Bus-Invert coding [6] . or more the energy saving is more than 50 % .
A CIRCUIT FOR CRT DRIVERS
To verify CRT we designed a circuit that implements the conceptual network of Figure 4 . Our circuit implementation consisted of the bus and the CRT drivers of the lines. The CRT driver detects the transition of the line and connects it either to the common node (q) or to its regular driver (chain of inverters). The proposed CRT driver was designed and laid out in .18p technology and its schematic is shown in Figure 10 . Using this driver we tested CRT for a 4-line and an 8-line bus. The layout of both the CRT and the standard drivers for the two cases are shown in Figure   11 . The CRT driver operates as follows. The switches w l , wq. . . . The charge recycling phase begins when CLK becomes 1. A negative spike appears at the output of the XNOR gate if the input x i changes value. This sets the latch and connects the line to the common node q through the transmission gate. The charge recycling phase ends when CLK becomes 0. This resets the latch, isolates the line from the common node (4) and connects it to the buffer chain. If the input xi does not make a transition, the latch remains reset during the whole clock cycle and the line remains connected to the buffer chain.
The same circuit can be used unchanged for buses with arbitrary number of lines.
SIMULATION AND RESULTS
CRT drivers of Figure 10 were used to drive the lines of a four and an eight line bus, n = 4 ,n = 8 . A netlist was extracted from the layout of the drivers for the simulation with HSPICE. The lines were modeled as in Figure 1 and for the capacitor C , we used the values 50fF, lOOfF, l50fF and 200fF. Note that these values could represent not only the line capacitors but all the loads as well. This is particularly the case of reconfigurable interconnect networks (e.g. in FPGAs) where long buses are loaded by the parasitic capacitances of several mosfets resulting to total capacitive loads of the size of a few picofarads [ll] . The clock frequency in the simulations was set to lOOMhz and the buses were fed with uniformly distributed i.i.d. sequences of data. In Figure 13 we see the average energy using CRT as a percentage of the average energy without CRT for the 4-line and 8-line buses. Again, the ratios are parametrized to C , . The flat lines correspond to the minimum possible ratios resulting from the theoretical analysis and shown in Figure 7 .
As it should be expected, for higher capacitive loads we get higher percentages of energy saving. This is because the average energy per cycle of the additional circuitry of the drivers is relatively independent of the loads. For larger loads this additional energy becomes less significant.
Finally, it is interesting to look at the waveforms of the individual lines during the two steps of the CRT. Figure 14 shows the waveforms of the line voltages of the 4-line bus. In this particular case, one line experiences a 1 + 0 transition and the rest three lines make a 0 4 1 transition. Since all lines transit they are all connected first to the common node q. The final voltage at node q during the charge redistribution period is of course z (equation (9)) and correspond to the converging point of the waveforms at time T / 2 = 5 n s .
It is interesting to note that for an individual transition the maximum energy saving with CRT occurs when all lines transition and 
CONCLUSIONS
A Charge Recycling Technique (CRT) for sub-micron buses has been proposed and analyzed. Closed form results for the transition energy have been given and used for the theoretical evaluation of the energy reduction with CRT. Reduction of the average transition energy by a factor of more than 2 can result in theory by the application of both CRT and Bus Invert coding. A line driver has been designed to implement CRT. Using it in an 8-line bus we have demonstrated net energy savings of up to 32%. Larger buses of 32 or 64 lines are expected to have higher energy savings.
