In order to reduce the power dissipation of CMOS products, semiconductor manufacturers are reducing the power supply voltage. This requires that the transistor threshold voltages be reduced as well to maintain adequate performance and noise margins. However, this increases the subthreshold leakage current of p and n MOSFETs, which starts to offset the power savings obtained from power supply reduction. This problem will worsen in future generations of technology, as threshold voltages are reduced further. In order to overcome this, we propose a design technique that can be used during logic design in order to reduce the leakage current and power. We target designs where parts of the circuit are put in "standby" mode when not in use, which is becoming a common approach for low power design. The proposed design changes consist of minimal overhead circuitry that puts the circuit into a "low leakage standby state," whenever it goes into standby, and allows it to return to its original state when it is reactivated. We give an efficient algorithm for computing a good low leakage power state.
Introduction
As VLSI devices have grown in complexity and density, their power consumption has become a major design concern. High power consumption exacerbates reliability problems by raising the device operating temperature. It also increases the current density in the supply lines, causing greater electromigration problems. High power consumption also impacts battery-powered portable devices by requiring either large battery packs or unacceptably short operating time. These issues have forced designers to aggressively pursue low-power design methodologies.
It is well known that the power dissipation is directly proportional to the square of the power supply voltage, while it is proportional only to the first power of the capacitance and frequency (Pau = C V 2 f ) .
Thus, much work has been done on the technology side to reduce the power supply voltage from 5 V to 3.3 V and below. This trend will likely continue in the future, and we may soon see 1 V or lower supplies [l] . However, reduction in the supply voltage has a negative effect on circuit performance. Propagation delays through logic gates increase as the supply voltage decreases, and the overall noise immunity of the circuit decreases.
t This work was supported by Intel Corp. and by a National Science Foundation Graduate Fellowship.
23
To reduce these undesirable effects, threshold voltages have also been lowered [l] . However, lower threshold voltages increase the leakage power dissipation caused by subthreshold conduction in the MOSFETs. For circuits with very low threshold devices, the standby (leakage) power is no longer negligible [I] and can even dominate the active and short-circuit power. Future circuits promise lower threshold voltages and even greater standby power dissipation. Thus, methods to manage or reduce the leakage power are needed.
A number of leakage reduction methods have already been proposed. Methods by Horiguchi, et al., [3] and Mutoh, et al., [4] use circuit and process level changes to reduce the leakage power. In this paper, we propose a novel design method that can be used during logic design in order to reduce the leakage power of CMOS circuits. This approach is useful for designs in which parts of the circuit are put into idle or sleep modes by holding the clock fixed at either low or high.
Such clock-gating schemes are quite common in lowpower designs [2] , so the proposed approach should be widely applicable.
Our approach is based on the observation that a CMOS logic gate dissipates leakage current in steady state which is dependent on the gate input state. Thus, given a multi-gate logic circuit, we modify the original logic design, using minimal additional circuitry, to force the combinational logic into a low-leakage state during an idle period. To find such a low-leakage state, we have developed an efficient algorithm that determines a good (low-leakage) input vector using a sampling of random vectors. The size of the sample set is determined a priori using user-supplied quality measures. The algorithm is based on a gate library which we have characterized for leakage current. We have demonstrated this method on the ISCAS-89 benchmark circuits and shown leakage power reductions of Our design methodology is given in section 2. Section 3 describes the gate library characterization process. An input vector selection technique is developed in section 4. Suitable latch designs are shown in section 5, and section 6 shows the results of our method. up to 54%.
Design Methodology
We propose to use a natural characteristic of static CMOS gates in order to reduce the leakage power. This is based on the observation that individual CMOS gates show a variation in the leakage power based on the input vector, as seen in Table 1 . Larger combinational circuits composed of many static CMOS gates exhibit similar variation, as shown in Fig. 1 , which displays the leakage power histogram for a 119-gate 1 .I circuit over a population of 100,000 randomly chosen input vectors. The leakage power for the circuit varies by a factor of two from the minimum leakage power to the maximum. Thus the leakage power is dependent on the input vector applied to the circuit. gate for each of its input combinations. We have done this by building look-up table models as in Table 1 for several kinds of logic gates and then running a simple steady state logic simulation in order to determine the leakage for a given circuit input vector. Modern low-power designs make extensive use of clock-gating. Clock-gating is a logic design method in which the clock is disabled to parts of the circuit during periods when they are not required to execute. These parts are said to be in standby mode, also called sleep mode or idle mode. The power supplies to these parts are n o t turned off, because of the performance and noise penalties that would result if this were done. Thus, whenever a circuit is put in standby, the latches or flip-flops inside it maintain the last state they were in. As a result, the circuit dissipates leakage power during standby corresponding directly to the logic state in which it was left.
Using minimal additional circuitry, we propose to modify the logic design so that whenever a circuit is put in standby, its internal state is set to a lowleakage state, preferably the lowest-leakage state possible. When that circuit is reactivated, the circuit will be returned to its last valid state. If the idle periods are long enough, this should lead to significant power reductions.
In section 4, we will present an efficient algorithm for finding a good low-leakage state. This has to be done only once, during the logic design phase. Once a good state vector has been determined, it can be multiplexed onto the logic inputs. Upon entering sleep mode, the state bits are selectively forced to 0 or 1, depending on the latch design used. Suitable latch designs in two common design styles are shown in section 5.
The search algorithm for a low-leakage input vector, to be given in section 4, depends on being able to estimate the leakage for a candidate vector. A circuitlevel simulator, such as SPICE, can be used to do this, but a more efficient solution is possible since the leakage depends only on the steady state node values. Thus, it is more efficient to build library models for logic gates that give the leakage current drawn by a 
Input Vector Determination Technique
Consider a combinational circuit whose input nodes are state bits of an overall sequential circuit which will be put in standby mode. We need to choose an input vector for the combinational circuit that causes it to dissipate very low leakage power. The "search problem" for the vector that gives the least leakage power is a very difficult one because of the potentially huge size of the search space. Furthermore, it is not absolutely necessary to find this minimizing vector. Instead, one is interested in a vector that gives a significantly lower value of leakage and which can be found efficiently. We have developed an algorithm to find such a vector based on a process of random sampling. Randomly chosen vectors are applied to the circuit and the leakage due to each is monitored, and the vector which gives the least observed leakage value is reported. It will be seen that this relatively simple approach works well, in the sense that a relatively small number of vectors is enough to come close to the lowest leakage current that would be observed from a much larger number of vectors.
Clearly, the number of vectors to be applied determines the "quality" of the resulting solution. In the following, we will derive a simple result (9) which gives the number n of vectors required, a priori, for a given 23.1.2 desired "quality" of the solution. To make this terminology more specific, we need to invoke a probabilistic view of the space of the Boolean vectors, as follows.
Consider the set of all possible input vectors to the circuit, and consider an experiment by which a vector is chosen at random, with all vectors having the same probability of being chosen. Thus, the Boolean space becomes a probability sample space. If v is an input vector, define a random variable (RV) X ( v ) to be the leakage current drawn by the circuit when U is applied at its input under steady state. Let f(z) and F ( z ) be the probability density function (pdf) and the cumulative density function (cdf) of X, respectively. A typical f(z) will be similar to the leakage power histogram in Fig. 1 .
If we choose n input vectors v1, vz, . . . , v, at random, by making an independent choice every time, the leakage current obtained in each trial, is a sample of an independent RV X; = X(vi), i = 1,. . . , n , with pdf f(z). Since they are independent, the set {XI, Xz, . . . , Xn} constitutes a random sample.
Define a new RV Y = min(X1, Xz, . . . , Xn), so that Y is the (random lowest leakage value over n the random sample. Suppose we are able to establish that, for sufficiently large n, we have:
where P{.} denotes probability, 0 5 E 5 1 with E M 0, 0 5 CY 5 1 with CY M 1, and F ( . ) is the cdf of X as stated above. This probability statement (1) would allow us to make a statistical confidence statement about an observed value (or sample) of Y , which we will denote by y, as follows: with better t h a n CY confidence, we have F ( y ) 5 E . This terminology is commonly used in statistics. Using the definition of the cumulative density function, i.e., F ( y ) = P{X < y}, this would lead to the statement that with better t h a n CY confidence, we have:
Thus, if we can determine a value of n for which (1) holds, then an observed value y of the RV Y will have the following special property: with better t h a n CY confidence, y i s a leakage current value such that only a very small fraction (less t h a n E ) of vectors in the Boolean space have leakage currents less t h a n y.
If we keep track of which input vector generated the leakage value y, that vector would be a good candidate for the desired low-leakage input vector. The quality of this vector would be determined by CY and E .
We now determine the value of n for which (1) holds for a given E and CY. This requires the distribution of the RV 2 = F ( Y ) , which is known to have a beta distribution from the field of reliability analysis and can be obtained from the rank distribution [5] :
with j = 1 and 0 5 z 5 1. Integrating this pdf to determine the cdf of Z yields:
Setting this to be greater than CY, according to (l) , leads to the desired result:
In
Thus, in order to have 95% confidence (CY = 0.95) that less than 5% ( E = 0.05) of the vector population has leakage current which is less than the least observed leakage (y) from n trials, we need n 2
[ln(0.05)/ ln(0.95)l = 59. For CY = 99% confidence and E = 1% error tolerance, we need at least 458 trials. In general, given a desired confidence CY and tolerance E , one can determine n a priori using (5).
Latch Designs
The circuitry used to force the low-power input vector onto the combinational logic should add as little speed and area overhead as possible to the design. Solutions such as pass-gate multiplexers and CMOS NAND and NOR gates can be used to force the outputs to the desired value during sleep mode. However, since the combinational circuit being analyzed is assumed to be part of a sequential network, the latches in the sequential network can be easily modified to force either a 0 or 1 during sleep mode. Fig. 2 shows standard static "jamb" latches [6] modified to force a value at the output during sleep mode. Fig. 3 shows modified dynamic C'MOS flip-flops [7] . Clearly, other types of latches and flip-flops can also be modified in similar ways to force appropriate values during sleep mode. 
Experimental Results
We have tested this design methodology on the ISCAS-89 benchmark circuits. Table 2 shows the savings realized by this method using the probabilistic technique developed in Section 4 for error tolerances of 5% and 1% and also using a fixed, large sampling of 100,000 input vectors. These "savings" represent the percentage difference between the lowest leakage power I Q 
Conclusion
In this paper, we have proposed a novel design method that can be used during logic design to reduce the leakage power of CMOS circuits that use clockgating to reduce the dynamic power dissipation. Using minimal additional circuitry, we modify the original logic design to force the combinational logic into a low-leakage state during an idle period. To find such a low-leakage state, we have developed an efficient algorithm that determines a good input vector using a sampling of random vectors. The size of this sampling is determined a priori using user-supplied quality measures. We have demonstrated this method on the ISCAS-89 benchmark circuits and shown leakage power reductions of up to 54%. 
