Abstract-Leakage power has become one of the most critical design concerns for the system level chip designer. While lowered supplies (and consequently, lowered threshold voltage) and aggressive clock gating can achieve dynamic power reduction, these techniques increase the leakage power and, therefore, causes its share of total power to increase. Manufacturers face the additional challenge of leakage variability: Recent data indicate that the leakage of microprocessor chips from a single 180-nm wafer can vary by as much as 20×. Previously proposed techniques for leakage-power reduction include the use of multiple supply and gate threshold voltages, and the assignment of input values to inactive gates, such that leakage is minimized.
I. INTRODUCTION
H IGH-POWER dissipation in integrated circuits shortens battery life, reduces circuit performance and reliability, and has a large impact on packaging costs. Power in complementary metal-oxide-semiconductor (CMOS) circuits consists of dynamic and static (due to leakage currents) components. Leakage is becoming an ever-increasing component of the total dissipated power, with its contribution projected to increase from 18% at 130 nm to 54% at the 65-nm node [21] . Leakage is composed of three major components: 1) subthreshold leakage; 2) gate leakage; and 3) reverse-biased drain substrate and source-substrate junction band-to-band tunneling leakage [4] . Subthreshold leakage is the dominant contributor to the total leakage at 130 nm and is predicted to remain so in the future [4] . In this paper, we present a novel approach for subthreshold leakage reduction.
Leakage-reduction methodologies can be divided into two classes depending on whether they reduce standby leakage or runtime leakage. Standby techniques reduce the leakage of devices that are known not to be in operation, while runtime techniques reduce the leakage of active devices. Several techniques have been proposed for standby-leakage reduction. Body biasing or variable threshold MOS (VTMOS)-based approaches [12] dynamically adjust the device V th by biasing the body terminal. 1 Mutithreshold CMOS (MTCMOS) techniques [13] , [17] , [18] , [26] use high-V th CMOS [or negative channel MOS (NMOS) or positive channel MOS (PMOS)] to disconnect Vdd or Vss or both to a logic circuit implemented using low V th devices in standby mode. Source biasing, where a positive bias is applied in standby state to the source terminals of OFF devices, was proposed in [11] . Other techniques, such as the use of transistor stacks [33] and input vector control [10] , have also been proposed.
The only mainstream approach to runtime-leakage reduction is the multi-V th manufacturing process. In this approach, cells in noncritical paths are assigned high V th , while cells in critical paths are assigned low V th . Wei et al. [30] presented a heuristic algorithm for the selection and assignment of an optimal high V th to cells on noncritical paths. The multi-V th approach has also been combined with several other powerreduction techniques [15] , [27] , [32] . The primary drawback to this technique has traditionally been the rise in process costs due to the additional steps and masks. However, the increased costs have been outweighed by the resulting substantial leakage reductions, and multi-V th processes are now standard. A new complication facing multi-V th is the increased variability of 1 Body biasing has also been proposed to reduce the leakage of active devices [22] . V th for low-V th devices. This occurs in part due to the random doping fluctuations, as well as the worsened drain-induced barrier lowering (DIBL) and short channel effects (SCEs) in devices with lower channel doping. The larger variability in V th degrades the achievable leakage reductions of multi-V th and worsens with continued MOS scaling. Moreover, multi-V th methodologies do not offer a smooth tradeoff between performance and leakage power. Devices with different V th typically have a large separation in terms of performance and leakage, for instance, a 15% speed penalty with a 10× reduction in leakage for high-V th devices.
The use of longer gate lengths (L Gate ) in devices within noncritical gates was first described in [29] . In that study, large changes to the gate lengths were considered, resulting in heavy delay and dynamic power penalties. Moreover, cell layouts with significantly larger gate lengths are not layout swappable with their nominal versions, resulting in substantial engineering change order (ECO) overheads during layout. In this paper, we propose very small increases in gate length for noncritical devices. These small increases maximize the leakage reduction since they take full advantage of the SCE and incur only very small penalties in drive current and input capacitance. Technologies at the 90-nm node and below employ super-halo doping, giving rise to reverse SCEs (RSCEs) that mitigate the traditional SCE to some extent. However, we have found the proposed technique to substantially reduce leakage for the two 130-nm and two 90-nm industrial processes that we investigated. Recent reports from leading integrated device manufacturers (IDMs) indicate that SCE continues to dominate V th roll-off characteristics at the 65-and 45-nm technology nodes [6] , [16] , [19] , [20] . However, we note that the V th roll-off curve must be understood to assess the feasibility of this approach and to determine reasonable increases for the gate length.
The variation of delay and leakage with the gate length is shown in Fig. 1 for an industrial 130-nm process. Leakage current flattens out with gate lengths beyond 140 nm, making L Gate biasing less desirable in that range. Another major advantage of L Gate biasing is leakage variability reduction. Since the sensitivity of leakage to gate length reduces with increased gate length, a fixed level of variability in gate length translates to a reduced variability in leakage. We use the terms gate-length biasing and L Gate biasing interchangeably to refer to the proposed technique. We use the phrase "biasing a device" to imply increasing the gate length of the device slightly.
In this paper, we also assess the costs and benefits of transistor-level L Gate biasing (TLLB). Since different transistors control different timing arcs of a cell, TLLB can individually modify delays of different timing arcs. Our hypothesis is that asymmetry in the timing criticality of different timing arcs of a cell instance in a circuit, and that of the rise and fall transitions, can be used by TLLB to yield significant leakage savings. Ketkar and Saptnekar [14] , Sirichotiyakul et al. [28] , and Wei et al. [31] proposed transistor-level V th assignment for leakage-power reduction. Our approach uses L Gate biasing instead of V th assignment and is similar to that of [31] . The major disadvantage of TLLB (or V th assignment) is the increase in library size and its characterization time.
The contributions of our paper include the following: 1) a leakage-reduction methodology based on a less than 10% increase in drawn L Gate of devices; 2) a thorough analysis of the potential benefits and caveats of such a biasing methodology, including the implications of lithography and process variability; 3) experiments and results showing the potential benefits of an L Gate -biasing methodology in different design scenarios such as dual V th . The organization of this paper is as follows. In Section II, we describe the proposed L Gate -biasing methodology for leakage reduction. Section III extends the ideas to be applied at the transistor level for further reduction of leakage at the cost of increased library size. Section IV presents the experiments and results for the validation of the proposed ideas. It also analyzes the potential manufacturing and process variation implications of biasing gate lengths. Finally, Section V concludes with a brief description of ongoing research.
II. CELL-LEVEL GATE-LENGTH BIASING
In this section, we describe the proposed cell-level L Gatebiasing (CLLB) methodology. Our approach extends a standard cell library by adding biased variants to it. We then use a leakage-optimization approach to incorporate slower low-leakage cells into noncritical paths, while retaining faster high-leakage cells in critical paths.
A. Library Generation
We generate a restricted library composed of variants of the 25 most commonly used cells in our test cases. 2 For each cell, we add a biased variant in which all devices have the biased gate length. We consider less than 10% biasing because of the following reasons.
1) The nominal gate length of the technology is usually very close to or beyond the "knee" of the leakage versus the L Gate curve which arises due to SCE. For a large bias, the advantage of super-linear dependence of leakage on the gate length is lost. Moreover, the dynamic power and delay both increase almost linearly with the gate length. Therefore, small biases give more "bang for the buck." 2) From a manufacturability point of view (discussed later in Section IV-B), having two prevalent pitches (which are relatively distinct) in the design can harm the printability properties (i.e., size of process window). We retain the same polypitch as the unbiased version of the cell: There is a small decrease in spacing between gate-poly geometries, but minimum spacing rules are not violated even when the unbiased polys are at minimum spacing, since our biases are within the tolerance margins. Since design rules check (DRC) tools first snap to grid, biases of under 10% are not detected and are considered acceptable due to margins in design rules. 3) An increase in the drawn dimension, which is less than the layout grid resolution (typically 10 nm for 130-nm technology), ensures pin compatibility with the unsized version of the cell. This is very important to ensure that multi-L Gate optimizations can be done post placement or even after detailed routing without ECOs. In this way, we retain the layout transparency that has made multi-V th optimization so adoptable within chip-implementation flows. Biases smaller than the layout grid pitch also ensure design-rule correctness for the biased cell layout, provided that the unbiased version is design-rule correct. For the simulation program with integrated circuits emphasis (SPICE) models we use, the nominal gate length of all transistors is 130 nm. In our approach, all transistors in a biased variant of a cell have a gate length of 138 nm. We choose 138 nm as the biased gate length because it places the delay of the low-V th -biased variant between the low-V th -nominal gate-length variant and the nominal-V th -nominal gate-length variant. A larger bias can lead to a larger per-cell leakage saving at a higher performance cost. However, in a resizing setup (described below) with a delay constraint, the leakage benefit over the whole design can decrease as the number of instances that can be replaced by their biased version is reduced. Larger or smaller biases may produce larger leakage reductions for some designs. Libraries, however, are not design specific and a biased gate length that produces good leakage reductions for all designs must be chosen. We have found the abovementioned approach for choosing the biased gate length to work well for all designs. We note that this value of 138 nm is highly process specific and is not intended to reflect the best biased gate length for all 130-nm processes. We discussed biasing at finer levels of granularity (i.e., having multiple biased gate lengths and independently biasing devices within a cell) in [9] . However, we did not find any significant leakage savings beyond those from the approach mentioned above.
investigate very small biases to the gate length, the layout of the biased library cell does not need to change, except for a simple automatic scaling of dimensions. Moreover, since the bias is smaller than the minimum layout grid pitch, design-rule violations do not occur. Of course, after the slight modifications to the layout, the biased versions of the cell are put through the standard extraction and power/timing characterization process.
B. Optimization for Leakage
We perform standard gate sizing (gate width sizing) prior to L Gate biasing using Synopsys Design Compiler v2003.06-SP1. Since delay is almost always the primary design goal, we perform sizing to achieve the minimum possible delay. We use a sensitivity-based downsizing (i.e., begin with all nominal cell variants and replace cells on noncritical paths with biased variants) algorithm for leakage optimization. In our studies, we have found downsizing to be significantly more effective at leakage reduction than upsizing (i.e., begin with all biased variants in the circuit and replace the critical cells with their nominal-L Gate variants), irrespective of the delay constraints. An intuitive rationale is that upsizing approaches have dual objectives of delay and leakage during the cell selection for upsizing. Downsizing approaches, on the other hand, only downsize cells that do not cause timing violations and have the sole objective of leakage minimization. We note that an upsizing approach may be faster when loose delay constraints are to be met since very few transistors have to be upsized. However, delay is almost always the primary design goal and loose delay constraints are rare. A timing analyzer is an essential component of any delay-aware power optimization approach; it is used to compute the delay sensitivity to biasing of cell instances in the design. For an accurate yet scalable implementation, we use three types of timers that vary in speed and accuracy. 1) Standard static timing analysis (SSTA). Slews and actual arrival times (AATs) are propagated forward after a topological ordering of the circuit. The required arrival times (RATs) are back propagated and slacks are then computed. The slew, delay, and slack values of our timer match exactly with Synopsys PrimeTime vU-2003.03-SP2 and our timer can handle unate and nonunate cells. 4 2) Exact incremental STA (EISTA). We begin with the fanin nodes of the node that has been modified. From all these nodes, slews and AATs are propagated in the forward direction until the values stop changing. RATs are back propagated from only those nodes for which the slew, AAT, or RAT has changed. The slews, delays, and slacks match exactly with SSTA. 3) Constrained incremental STA (CISTA). The sensitivity computation involves temporary modifications to a cell to find the change in its slack and leakage. To make this step faster, we restrict the incremental timing calculation to only one stage before and after the gate being modified. The next stage is affected by the slew changes and the previous stage is affected by the pin capacitance change of the modified gate. The ripple effect on other stages farther away from the gate (primarily due to slew changes 5 ) is neglected since high accuracy is not critical for sensitivity computation. We use the phrase "downsizing a cell instance" (or node) to mean replacing it by its biased variant in the circuit. In our terminology, s p represents the slack on a given cell instance p, and s p represents the slack on p after it has been downsized. p and p indicate the initial and final leakages of cell instance p before and after downsizing, respectively. P p represents the sensitivity associated with cell instance p and is defined as
The pseudocode for our leakage-optimization implementation is given in Fig. 2 . The algorithm begins with SSTA and initializes slack values s p in Line 1. Sensitivities P p are computed for all cell instances p and put into a set S in Lines 2-5. We select and remove the largest sensitivity P p * from the set S and is downsized and EISTA is run from it to update the delay, slew, and slack values in Lines 12-13. Our timing libraries capture the effect of biasing on the slew as well as the input capacitance and our static timing analyzer efficiently and accurately updates the design to reflect the changes in delay, capacitance, and slew due to the downsizing move. If there is no timing violation (negative slack on any timing arc), then this move is accepted, otherwise the saved state is restored. If the move is accepted, we also update the sensitivities of node p * , its fan-in nodes, and its fan-out nodes in Lines 17-21. The algorithm continues until the largest sensitivity becomes negative or the size of S becomes zero. Function ComputeSensitivity(q) temporarily downsizes the cell instance q and finds its slack using CISTA. Since high accuracy is not critical for the sensitivity computation, we choose to use CISTA, which is faster but less accurate than EISTA. Table I shows a comparison of leakage and runtime when EISTA and CISTA are used for sensitivity computation.
III. TRANSISTOR-LEVEL GATE-LENGTH BIASING
We use the term timing arc to indicate an intracell path from an input transition to a resulting rise (or fall) output transition. For an n-input gate, there are 2n timing arcs. 6 Due to different parasitics, as well as PMOS/NMOS asymmetries, these timing arcs can have different delay values associated with them. For instance, Table II shows the delay values for the same input slew and load capacitance pair for different timing arcs of a NAND2X2 cell from the Artisan TSMC 130-nm library. Pin swapping is a common postsynthesis timing optimization step to make use of the asymmetry in delays of different input pins. To make use of asymmetry in rise-fall delays, techniques such as PMOS/NMOS (P/N) ratio perturbations have been previously proposed to decrease circuit delay [5] . We propose to exploit these asymmetries using TLLB to "recover" leakage from noncritical timing arcs within a cell.
A. Library Generation
For each cell, our library contains the variants corresponding to all subsets of the set of timing arcs. A gate with n inputs has 2n timing arcs and, therefore, 2 2n variants (including the original cell). Given a set of critical timing arcs, our goal is to assign a biased L Gate to some transistors in the cell and a nominal L Gate to the remaining transistors such that: 1) critical timing arcs have a delay penalty of under 1% with respect to the original unbiased cell and 2) cell leakage power is minimized. Assignment of L Gate to transistors in a cell, given a set of critical timing arcs, can be done by analyzing the cell topology for simple cells. However, we automate the process in the following manner. We enumerate all configurations for each cell in which nominal L Gate is assigned to some transistors and biased L Gate to the others. For each configuration, we find the delay and leakage under a canonical output load of an inverter (INVX1) using SPICE. Now for each possible subset of timing arcs that can be simultaneously critical, one biasing configuration is chosen based on the two criteria given earlier. Fig. 3 shows L Gate biasing of the transistors in the simplest NAND cell (NAND2X1) when only the rise and fall timing arcs from input A to the output are critical. In this case, only the PMOS device with B as its input can be slowed without penalizing the critical timing arcs.
B. Optimization for Leakage
We use a sensitivity-based downsizing approach that is very similar to the one described in Section II-B. We keep track of the slack on every timing arc and compute sensitivity for each timing arc. To limit the runtime and memory requirements, we first optimize at the cell level and then optimize at the transistor level for only the unbiased cells in the circuit.
IV. EXPERIMENTS AND RESULTS
We now describe our test flow for the validation of the L Gate -biasing methodology, and present experimental results. Details of the test cases 7 used in our experiments are given in Table III To identify the most frequently used cells, we synthesize our test cases with the complete library and select the 25 most frequently used cells. The delay constraint is kept tight so that the postsynthesis delay is close to the minimum achievable delay.
We consider up to two gate lengths and two threshold voltages. We perform experiments for the following scenarios: . We use Cadence SignalStorm v4.1 (with SYNOPSYS HSPICE) for delay and power characterization of cell variants. Synopsys Design Compiler is used to measure the circuit delay, dynamic power, and leakage power. We assume an activity factor of 0.02 for the dynamic power calculation in all our experiments. We do not assume any wire-load models, as a result of which, the dynamic power and delay overheads of L Gate biasing are conservative (i.e., overestimated). All experiments are run on an Intel Xeon 1.4-GHz computer with 2 GB of RAM. Table IV shows the leakage savings and delay penalties due to L Gate biasing for all the cells in our library. The results strongly support our hypothesis that small biases in L Gate can afford significant leakage savings with a small performance impact. To assess the maximum impact of biasing, we explore the power-performance envelope obtained by replacing every device in the design by its device-level biased variant.
A. Leakage Reduction
We now use our leakage-optimization approach to selectively bias cells on the noncritical paths. Table V shows the leakage reduction, dynamic power penalty, and total power reduction for our test cases when L Gate biasing is applied without the dual-V th assignment. Table VI shows results when L Gate biasing is applied together with the dual V th approach. To show the effectiveness of L Gate biasing with loose delay constraints, results when the delay constraint is relaxed are also shown for each circuit. The leakage reductions primarily depend on the slack profile of the circuit. If a lot of paths have near-zero slacks, then the leakage reductions are smaller. As the delay penalty increases, more slack is introduced on paths and larger leakage reductions are seen. We observe that leakage reductions are smaller when the circuit has already been optimized using dual-V th assignment. This is expected because the dual-V th assignment consumes slack on noncritical paths, reducing the slack available for L Gate optimization. We also observe larger leakage reductions in sequential circuits; this is because circuit delay is determined by the slowest pipeline stage and the percentage of noncritical paths is typically higher in sequential circuits.
Our leakage models do not include gate leakage, which can marginally increase due to biasing. Gate leakage is composed of gate-length dependent [gate-to-channel (I gc ) and gate-to-body (I gb ) tunneling] and independent components [edge direct tunneling (I gs + I gd )]. The gate-length independent component, which stems from the gate-drain and gate-source overlap regions, is not affected by biasing. To assess the change in gate-length dependent components due to biasing, we perform SPICE simulations to report the gate-to-channel leakage 8 for both nominal and biased devices. We use 90-nm BSIM4 device models from a leading foundry that model all five components of gate leakage described in BSIM v4. 4 .0. Table VII shows the gate and subthreshold leakage for biased and unbiased nominal V th NMOS and PMOS devices of 1-µm width at 25
• C and 125
• C. The reductions in subthreshold and gate leakage as well as the total leakage reduction are shown. Based on these results, we conclude that the increase in gate leakage due to biasing is negligible. Furthermore, since biasing is a runtime-leakagereduction approach, the operating temperature is likely to be higher than room temperature-in this scenario, gate leakage is not a major portion of the total leakage. When the operating temperature is elevated, the reduction in total leakage is approximately equal to the reduction in the subthreshold leakage, and total leakage reductions similar to the results presented in Tables V and VI are expected. 9 Gate leakage is predicted to increase with technology scaling; technologies under 65 nm, however, are likely to adopt high-k gate dielectrics, which will tremendously reduce gate leakage. Therefore, in terms of scalability, subthreshold leakage remains the key problem at high operating temperatures. We also note that because the vertical electric fields do not increase due to biasing, negativebias thermal instability (NBTI) is not expected to increase with biasing [25] .
B. Manufacturability and Process Effects
In this section, we investigate the manufacturability and process variability implications of our L Gate -biasing approach. As our method relies on the biasing of the drawn gate length, it is important to correlate this with the actual printed gate length on the wafer. This is even more important as the bias we introduce in the gate length is of the same order as the typical critical dimension (CD) tolerances in the manufacturing processes. Moreover, we expect larger gate lengths to have better printability properties, leading to less CD-and hence leakagevariability. To validate our multiple gate-length approach in a postmanufacturing setup, we follow a reticle enhancement technology (RET) and process a simulation flow for an example cell master.
We use the layout of a generic AND2X6 cell and perform a model-based optical proximity correction (OPC) on it using Calibre v9.3_2.5 [1] . 10 The printed image of the cell is then calculated using a dense simulation in Calibre. The layout of the cell, along with printed gate lengths of all devices in it, is shown in Fig. 4 . We measure the L Gate for every device in the cell, for both biased and unbiased versions. The printed gate lengths for the seven NMOS and PMOS devices labeled in Fig. 4 are shown in Table VIII . As expected, biased and unbiased gate lengths track each other well. There are some outliers that may be due to the relative simplicity of the OPC model being used. High correlation between the printed dimensions of the biased and unbiased versions of the cells show that the benefits of biasing estimation using the drawn dimensions will not be lost after the RET application and the manufacturing process.
Another potentially valuable benefit of slightly larger gate lengths is the possibility of improved printability. Minimum poly spacing is larger than the poly gate length, so that the process window (which is constrained by the minimum resolvable dimension) tends to be larger as the gate length increases even though the poly spacing decreases. For example, the depth of focus for various values of exposure latitude with the same illumination system as above for 130-and 138-nm lines is shown in Table IX. 
11

C. Process Variability
A number of sources of variation can cause fluctuations in the gate length, and hence, in performance and leakage. This has been a subject of much discussion in the recent literature (e.g., [8] and [23] ). Up to 20× variation in leakage has been reported in the production of microprocessors [7] . For leakage, the reduction in variation post biasing is likely to be substantial as the larger gate length is closer to the "flatter" region of the V th versus the L Gate curve. To validate this intuition, we study the impact of gate-length variation on leakage and performance both pre-and postbiasing using a simple worst case approach. We assume the CD variation budget to be ±10 nm. The performance and leakage of the test-case circuits is measured at the worst case, nominal, and best case process corners, which consider only the gate-length variation. This is done for the DVT-DGL approach in which the biasing is done along with the dual V th assignment. The results are shown in Table X . For the seven test cases, we see up to a 41% reduction in leakage-power uncertainty caused by linewidth variation. Such large reductions in uncertainty can potentially outweigh the benefits of alternative leakage-control techniques. We note that the corner case analysis only models the inter-die component of the variation, which typically constitutes roughly half of the total CD variation. To assess the impact of both within-die (WID) and die-to-die (DTD) components of variation, we run 10 000 Monte Carlo simulations with σ WID = σ DTD = 3.33 nm. The variations are assumed to follow a Gaussian distribution with no correlations. We compare the results for three dual V th scenarios: 1) unbiased (DVT-SGL); 2) biased (DVT-DGL); and 3) uniformly biased (when gate lengths of all transistors in the design are biased by 8 nm). Leakage distributions for the test case alu128 are shown in Fig. 5 . Note that in uniform biasing, all devices are biased and the circuit delay no longer meets timing.
D. Leakage Reduction From Transistor-Level Gate-Length Biasing
Table XI presents the leakage-power reductions from TLLB over CLLB. We see up to a 10% reduction in leakage power over CLLB. Since TLLB only biases devices of the unbiased cells, it performs well over CLLB when CLLB does not perform well (i.e., when CLLB leaves many cells unbiased). The leakage savings from TLLB come at the cost of increased library size. As described in Section III-A, the library is composed of all 2 2n variants of each n-input cell. For the 25 cells, our library for TLLB was composed of a total of 920 variants. From the small leakage savings at the cost of significantly increased library size, we conclude that TLLB should only be performed for single-and double-input cells that are frequently used.
V. CONCLUSION AND ONGOING STUDIES
We have presented a novel methodology that selectively used small L Gate biases to achieve an easily manufacturable approach to runtime-leakage reduction. For our test cases, we have observed the following.
1) The gate-length bias we propose is always less than the pitch of the layout grid; this avoids design-rule violations. Moreover, it implies that the biased and unbiased cell layouts are completely pin compatible, and hence, layout swappable. This allows biasing-based leakage optimization to be possible at any point in the design flow, unlike in sizing-based methods. 2) With a biasing of 8 nm in a 130 nm process, leakage reductions of 24%-38% are achieved for the most commonly used cells with a delay penalty of fewer than 10%. 3) Using simple sizing techniques, we are able to achieve up to 33% leakage savings with less than 3% dynamic power overhead and no delay penalty. The use of more than two gate lengths for the most commonly used cells, along with improved sizing techniques, is likely to yield better leakage savings. 4) We compared gate-length biasing at the cell level and at the transistor level. Transistor-level gate-length biasing can further reduce leakage by up to 10%, but requires a significantly larger library. Therefore, transistor-level biasing should be done for only the most frequently used cells such as inverters, buffers, NAND, and NOR gates. Fortunately, the most frequently used cells have one or two inputs and hence, only a small number of transistorlevel biasing variants need to be characterized for them. For cells with three or more inputs, no transistor-level biasing variants may be created (i.e., only cell-level biasing variants are created). To further reduce the library size, only one of the cell variants in which different logically equivalent inputs are fast may be retained, and pin-swapping techniques can be used during leakage optimization. 5) The devices with biased gate lengths are more manufacturable and have a larger process margin than the nominal devices. Biasing does not require any extra process steps, unlike multiple-threshold-based leakageoptimization methods. 6) L Gate biasing leads to more process-insensitive designs with respect to the leakage current. Biased designs have up to 41% less worst case leakage variability in the presence of inter-die variations, as compared to the nominal gate-length designs. In the presence of both inter-and intra-die CD variations, selective L Gate biasing can yield designs less sensitive to variations. Our ongoing study is along the following directions.
1) Construction of effective biasing-based leakage-optimization heuristics. To increase scalability, we plan to investigate "batched" moves in which several independent cells or transistors are biased in every iteration. 2) Assessment of leakage savings from the use of more than two gate lengths for the more frequently used and leaky cells in the library, such as inverters and buffers. Also, the development of better approaches to reduce the cell library size. 3) Evaluation of the impact of biasing on leakage at future technology nodes for which leakage is a much bigger issue than it is at 130 nm.
ACKNOWLEDGMENT
A preliminary version of this paper appeared in [9] .
