Interconnects are an increasing concern in recent years, resulting in novel techniques such as current sensing. However these techniques must be designed to tradeoff delay and both dynamic and static power consumption. This paper presents an innovative approach to reduce static power in differential current-sensed interconnects. This system uses a self-timed shut-off system to reduce static currents used to bias the current sense amplifier. Results indicated that the self timed shut-off system reduced static power by 23.4% for a 10mm line in 250nm technology with no overhead in performance. On an average it reduced static power by 9.7% for 4mm-9mm lines over 180nm, 130nm, 100nm and 65nm technologies and 6% from 10mm-15mm line over the same set of technologies as before. Physical design of the system was implemented in 250nm technology along with the implementation of a test circuit, ready to be fabricated. Extensions of this shut-off mechanism may be useful for mitigating leakage power in a variety of interconnect circuits.
INTRODUCTION
Following Moore's law and current scaling trends, [10] predicts that power dissipation in microprocessors is clearly be- * This work was funded by Semiconductor Research Corporation under the contracts 766 and 1075, and a grant from Intel Corporation.
coming impractical, stressing the need for innovative circuit designs that reduce active and leakage power. Even with aggressive V dd scaling in the nanometer realm, the gap in power numbers required by power-aware, energy-aware designs and the achieved power is high. Global interconnects are widely accepted as a limiting factor for performance since interconnect delay increases quadratically with increase in the length of the wire [8] . Both voltage-mode techniques and current-mode techniques have been used to reduce this trend in interconnect delay.
Delay in interconnects has been typically reduced from quadratic to linear using repeater insertion techniques [8] . However Sylvester [6] predicts that the total number of repeaters, which is around 50,000 in 250nm, increases to around 700,000 in 70nm technology. Apart from this, the power due to repeaters alone increases from 8W in 250nm to 60W in 70nm, a six fold increase. Alternative designs based on current-mode circuits have been proposed for global interconnects, including differential current sensing [1, 9] , and other low swing signaling methods [7] . For example, [5] cites that Alpha 21264 reduced the worst-case power on buses by limiting the voltage swing to 10% of V dd . One such current-mode design proposed by [1]is the differential current sense amplifier. It is shown that they perform better in some cases in terms of delay [1] and when combined with 25% repeaters, perform 30% better than conventional delay-optimal repeaters [2] . But their major drawback, as with most differential amplifiers, is static power dissipation.
This work presents a power-aware version of the differential current sense amplifier originally proposed by [1] combined with a self-timed shut-off system which reduces static power dissipation. This is achieved by applying a self-timed shutoff system such as is prevalent in memory circuits [3] , by which the sense amplifier is disabled after sensing is done and enabled before the start of sensing. This paper is organized as follows. Section 2 provides the basics of differential current sensing and analyzes its components of power dissipation. Section 3 illustrates the design of the self-timed shut-off system and compares its power with the original work. Section 4 talks about the physical design implementation and Section 5 concludes this paper.
DIFFERENTIAL CURRENT SENSE AMPLIFIER

Circuit and Operation
Differential current sense amplifier (DCSA), a novel circuit technique that uses current-mode signaling, can be ap- Figure 1 , provides a low impedance termination at the receiver. In Figure 1 , the drain of M5 and M6 are used for low impedance termination. This effectively clamps the interconnect at a specified voltage. Standard inverter based drivers provide the current push or pull to the voltage clamped interconnect wires. Two interconnect wires connect the signal and its complement to either input on the differential current sense amplifier (DCSA). 
Power Analysis
There are three major sources of power dissipation in the DCSA circuit given by Equation 1.
P dynamic refers to the switching power due to the charging and discharging of the interconnect capacitance. P direct−path refers to the power to a direct-path power between V dd and ground which is seen in Figure 2 (dashed arrow). Throughout this paper it is referred to as static power. The third component which is the leakage power P leakage , is due to sub-threshold conduction predominantly seen in nanometer technologies. Although leakage power cannot be discounted from the total power dissipation PDCSA, here static power dissipation is stressed due to their contribution to the total power PDCSA. When only interconnect is considered, leakage power contribution is an order of magnitude less than static and dynamic power.
In Figure 2 , the static power path is as shown with the dashed arrow. This NMOS device is "always on" so there is a continuous path to ground whenever the driver PMOS is on. Figure 3 an order of magnitude less than static and dynamic power and hence not shown. Static power is given in dark and dynamic power is given in light bars respectively. Note that the dynamic power increases with respect to line length since the interconnect capacitance increases as line length increases. Dynamic power depends on activity factor which is assumed to be 50% Static power given in a darker shade in Figure 3 increases until 4mm and decreases monotonically. It is due to flow of static current across the two resistances given in Equation 2. where Rint is the interconnect resistance and Rnmos is the resistances of M5 or M6 from Figure 1 and it is also the same size as the driver NMOS for each wirelength.
The W/L ratio of M5 and M6 increases as wirelength increases. Until 4mm the resistance Rnmos of M5 and M6 dominates R total and it decreases from 1mm to 4mm, so there is a increase in the static current and hence increase in static power. After 4mm interconnect resistance Rint starts to dominate and it increases with increase in wirelength, which results in the decrease of the static current, hence the decrease in the static power.
SELF-TIMED SHUT-OFF SYSTEM
Circuit and Operation
In order to reduce static power P direct−path , it is required to disable the NMOS devices M5 and M6 in Figure 4 shows the modified Power-Aware version of the DCSA (PA-DCSA). As seen in Figure 4 additional NMOS devices have been added to the original DCSA. The first set of additional devices are the decoupling devices M9 and M10. These devices decouple the cross coupled inverters from path B. They are gated by the sense enable (SE) signal which enables and disables the sense amplifier. The next set of NMOS devices are M11 and M12 called discharge devices. They maintain the drains of the cross coupled inverter NMOS M3 and M4 to a virtual ground and discharge the excess charge accumulated at their drains after the sense amplifier is disabled. The last of the additional devices is M8 gated by SE. This provides the new low impedance path in series with M5 or M6. When SE goes low this closes path A.
The working of PA-DCSA is similar to DCSA. When sense enable (SE) is high along with equalizing signal (EQ), M8 is on along with M5 and M6, which are "always on", provide the low impedance path (Path A) needed for differential formation. Once the differential is formed EQ goes low and the cross coupled inverter pair sense the differential and swing OUT and OU T to the respective voltages. After a delay SE goes low turning M8 off and thereby shutting off the direct path to ground, ideally disabling the sense amplifier. The decoupling devices also are off when SE is off there by re-routing any current coming through the node IN through M5, M6, which is then driven to the IN onto the other interconnect (Path B). The charges collected on nodes C1 and C2 are discharged through the either one of the discharge devices M11 and M12, depending on whether OUT or OU T is high. Once EQ goes high SE goes high instantly opening M8 thereby enabling the sense amplifier for the next sensing cycle.
The self-timed mechanism explained in Section 4 derives the sense enable SE signal internally from the edges of EQ. When the sense amplifier is disabled, the static current fol- lows Path B. Path B is formed by devices M5 and M6,when the sense amplifier is disabled by device M8. Additionally, since there is no continuous static power path which existed before there is reduction in static power dissipation. However there is a discharge path to ground when the discharge devices M11 and M12, which discharge the intermediate charges accumulated at nodes C1 and C2. Once the outputs swing to their respective levels one of the discharge path is ON but there is no direct path from V dd to ground. If IN is high and IN is low, due to Path B there might be a static current flowing from the PMOS driving IN to the NMOS sinking IN leading to a direct path from V dd to ground. However this is not possible since the switching frequency is so high that there will not be enough time to fully charge all the capacitances of both the interconnects. Another observation is that driving the current through Path B charges the interconnect capacitances. In the next cycle if IN goes low there will be a delay in discharging the charge that was accumulated. At the same time if IN goes high the interconnect will be charged quickly as there is already a small charge accumulation from last cycle. sents PA-DCSA and the light bar represents DCSA. From the plot it can be seen that there is a reduction in the static power dissipation after the self-timed shut-off system is used as compared to before. Comparing of static power rather than total power seems more appropriate due to the fact that the dynamic power of both DCSA and PA-DCSA is the same since both of them use the same interconnect under the same switching activity of 50%. A 10mm line can be considered a nominal length since any lengths more than 15 mm has to be pipelined to achieve better throughput and any lengths less than 4mm are considered to be local interconnects. For a 10mm line the self-timed shut-off system reduces static power from 1.85mW to 1.45mW a 23.7% reduction. Figure 6 compares the energy consumed per switching event at 250nm for DCSA and PA-DCSA. There is a constant amount of energy that is reduced when PA-DCSA is used when compared with DCSA. Energy consumed per switching event or Power-Delay Product (PDP) is a quality measure for a switching circuit. From Figure 6energy consumed per switching activity increases with respect to line length due to an increase in interconnect capacitance.
Power Analysis
The energy per switching event is given under a constant switching activity of 50% and constant bandwidth, as energy varies with bandwidth and switching activity. From Figure 6 it can be seen that the energy increases as line length increases and this is due to the increase in capacitive load of the interconnect. Figure 7 shows the performance of PA-DCSA over DCSA over technologies including 180nm, 130nm, 100nm and 65nm. It indicates the percentage reduction in static power by applying self-timed shut-off system. From Figure 7 it can be seen that the highest power savings is seen in 250nm. On an average there is a 24.5% static power reduction over the linelength range of 4mm-15mm. The average static power savings at 65nm technology is around 9.5%. Figure 8 details the percentage reduction in energy consumed per switching event with an activity of 50% for different technologies. For a 10mm line on an average there is a 10% reduction in energy consumed per switching event or PDP. For 65nm design over 10mm there is actually an increase in the energy consumed indicating a performance degradation when PA-DCSA is used over DCSA. This is due to slow discharge of nodes C1 and C2 by discharge devices 
PHYSICAL DESIGN IMPLEMENTATION
Physical design of test circuit was implemented in 250nm technology for wirelengths ranging from 1mm-15mm using layout design tools. Figure 9 shows the simulation of DCSA and PA-DCSA discussed in Sections 2 and 4 respectively. In Figure 9 panel A shows the operation of DCSA. It can be seen that after a considerable difference is formed in the static currents when the equalizing signal EQ goes low. At the onset of EQ going low OUT and OU T , originally held at V dd /2, swing to the respective rails. panel B in Figure 9 shows input currents of DCSA and PA-DCSA along with EQ and the sense enable signal SE. It can be seen from panel B that when the SE and EQ are high the sense amplifier is enabled and both the currents (DCSA and PA-DCSA) provide a similar differential. SE goes low 100ps after EQ goes low in this particular simulation. It can be observed that one of the current value reduces during the time SE is low and the other current direction is reversed with the same magnitude (Path B). This reduction in static current effectively reduces static power.
A test circuit was also implemented in 250nm technology which can be used to test, verify operation and measure output voltage levels. The system level figure of the test Figure 10 consists of a vernier delay line sampler [4] to sample the output from DCSA. Some of the smaller circuits included in the test circuit as a part of the vernier delay line sampler, were single to differential converters, biased inverters and latches. The DCSA part of the test circuit had a driver driving the interconnect with the DCSA terminating the interconnect. The outputs of the DCSA is given to the vernier delay line sampler, which consists of a series of delay lines. It samples the input (DCSA outputs) at regular intervals which can be read off-chip. A simple circuit for generating the sense enable (SE) edge from EQ was also included in the test circuit is shown in Figure 11 . It consists of a chain of buffers which delay the EQ signal and AND it with a present copy of EQ to obtain SE. When EQ goes from low to high SE will go high after a delayed time. When EQ goes low, immediately SE will go low. The test circuit will be used to test the self-timed shut-off system once it has been fabricated.
CONCLUSION
In this paper we have presented a self-timed shut-off system for current sensed interconnects aimed at reducing static power dissipation. We have presented the results showing significant power savings before and after using the system on global interconnects in 250nm, and illustrated the savings obtained when scaled. The self-timed shut-off system reduced static power on an average by 29.4% for 4mm-9mm lines and 19.7% for 10mm-15mm lines in 250nm technology respectively. A test circuit was also laid out in 250nm technology ,simulated along with the DCSA and PA-DCSA to compare the above results after fabrication. A small overhead of 5 devices, reflecting in terms of an increase in area, is a drawback of this power-aware version of DCSA. A tradeoff can be achieved for power-aware systems between area and power when this system is applied. Although leakage power is not prominent when considered one interconnect line, this can be significant for buses when for every two lines, a differential low swing system is used. The self-timed shut-off system can be applied to other signaling methods where power reduction is an important consideration and coupled along with other standard leakage power reduction methods to yield greater savings for power-aware microprocessors.
