Abstract-We propose a transistor sizing method that downsizes MOSFETs inside a cell to eliminate redundancy of cell-based circuits as much as possible. Our method reduces power dissipation of detail-routed circuits while preserving interconnects. The effectiveness of our method is experimentally evaluated using 5 circuits. The power dissipation is reduced by 77% maximum and 65% on average without delay increase.
I. INTRODUCTION
Cell-base design has a well-established framework for the development of ASICs, and has been widely adopted. On the other hand, cell-based circuits inherently contain redundancy, for example, in power dissipation. In this paper, we propose a post-layout transistor sizing method for power reduction. Our method aims to reduce the redundancy of cell-base design and to obtain high performance circuits close to full-custom quality while keeping the cell-base design framework. We downsize MOSFETs inside a cell continuously, and generate the corresponding cell layout on the fly. The cell layout generation system used in our method does not change the location of input and output pins while the transistor widths inside a cell are varied [1] . Exploiting this feature, we can optimize detail-routed circuits, without any modifications of interconnects, using the precise wire capacitance values extracted from the detail-routed circuits.
Many transistor sizing methods for delay and power optimization have been proposed [2, 3, 4, 5, 6] . These methods need to derive the delay time of each cell at any MOSFET size. Refs. [2, 3, 4] utilize Elmore delay model. In this delay model, we can get the optimal solution of the problem formulated using a simple variable-transformation method. However, the accuracy of the delay model is not high enough, and hence the optimized circuits may violate the delay constraints. In Refs. [5, 6] , the cell delay is approximated as a linear function of the cell size, and transistor sizing is formulated as a linear optimization problem. This method also can obtain the optimal solution of the formulated problem. However, the linearization of the cell delay may introduce errors in timing analysis.
Recently, the delay time due to wire capacitance occupies a considerable part of the total circuit delay. Many of the previous transistor sizing methods [2, 3, 5, 6] concentrate on circuit-level optimization, and the consideration on layout is not enough. When the optimization result is applied to the layout, routing is affected, i.e. wire capacitances in the resulting layout become different from the initial circuit before transistor sizing. The variation of wire capacitance may cause a violation of delay constraints. In Ref. [4] , transistor sizing, rerouting and compaction techniques are performed to the circuit repeatedly for better consideration on layout. In a DSM process, coupling capacitances between adjacent interconnects in the same metal layer or two successive metal layers become dominant. The accurate capacitance evaluation of all the interconnects influenced by re-routing and compaction becomes computationally intensive and hence the repeated evaluation inside the optimization loop may become impractical.
Our method handles detail-routed circuits designed in a cellbase design style. Our method down-sizes MOSFETs inside a cell for power reduction without any modifications of wiring using accurate values of wire capacitance. We use a cell layout generation system called VARDS [1] that can generate cell layout with variable transistor width while keeping the location of terminals unchanged. In order to get the accurate cell delay time, our method utilizes four-dimensional look-up tables with four variables; gate widths of PMOS and NMOS transistors, input transition time, and load capacitance. This paper is organized as follows. Section II explains the post-layout transistor sizing method. Cell layout generation, cell delay model, and transistor sizing algorithms are discussed. Section III demonstrates some experimental results. Finally, Section IV concludes the discussion.
II. POST-LAYOUT TRANSISTOR SIZING
In this section, we explain a transistor sizing method for power reduction preserving interconnects. We first discuss cell layout generation for post-layout transistor sizing. Next, we show a cell delay model that can calculate delay time for any PMOS and NMOS transistor sizes. Then, the noise margin constraints that guarantee the correct behavior of the circuits are discussed. Finally, we explain a transistor sizing algorithm for power reduction. B B  B  B  B B  B  B  B B  B  B  B B  B  B  B B  B  B  B  C  C 
VSS VSS VDD VDD
(b) every transistor width is different. 
A. Cell Layout Generation
In order to apply the optimization result to the layout without any modifications of interconnects, the following features are required for cell layout generation. The fixed locations of input/output pins are needed to preserve interconnects. A cell layout generation system VARDS, which satisfies the above two requirements, has been proposed [1] . Fig. 1 shows an example of AOI21 cells whose height is 9 interconnect pitches. The AOI21 cell in Fig. 1(a) is generated such that all transistor widths are the maximum. Fig. 1(b) is an example that every transistor width is different.
B. Cell Delay Model
In the proposed method, PMOS and NMOS transistors inside a cell are resized separately. Our method hence requires a cell delay model that has four variables, , and y { z beforehand using a circuit simulator. Cell delay time is derived from the look-up tables using the following three-step interpolation (Fig. 2 ). In the case of a multi-stage cell, we divide the cell into single-stage cells, and calculate the delay time of each single-stage cell. Step 2:
Step 3:
Step 1:
Find P1,P2,P3,P4
where
are coefficients to be determined such that the four values of the neighboring points are assigned to each interpolation equation. The transition time of the output signal is calculated similarly. In the case of the dissipated energy, Eq. (4) is used for the interpolation at Step3.
C. Noise Margin Constraints
Adequate amounts of noise margins are important to ensure the correct behavior of the circuits. The noise margins are defined as can be derived from the following two equations [7, 8] .
Similarly, the lower bound å ae ª é × ì ù ø × s ë can be obtained from the following two equations.
are the threshold voltages of PMOS and NMOS transistors. We resize PMOS and NMOS transistors for power reduction within the range of
D. Transistor Sizing Algorithm
We device a transistor sizing algorithm for power reduction based on sensitivity calculation. Our algorithm executes iterative optimization that decreases
is a variable that represents the amount of transistor width reduced in a single iteration.
is smaller than a pre-defined value, the optimization procedure finishes.
Step3: At each cell, evaluate the sensitivity, i.e. the amount of power reduction when the transistor widths decrease by
. If the violations of noise margin or transition time constraints occur, sensitivity calculation is not performed. First, the above algorithm is executed for power reduction such that PMOS and NMOS transistors are resized simultaneously with the same å Ü s çå n ratio. We next optimize power dissipation resizing PMOS and NMOS transistors independently, and we then get the final optimization result.
III. EXPERIMENTAL RESULTS
In this section, some experimental results are shown. We first demonstrate the accuracy of the cell delay model based on look-up tables. We next show the power optimization results.
We generate cell layouts using VARDS [1] in a 0.35 ¦ m process with three metal layers. The cell height is 13 interconnectpitches, and the size ratio of PMOS and NMOS transistors is 1. In transistor sizing, we down-size MOSFETs within the range that VARDS can generate cell layouts. The maximum transistor width of standard driving-strength(x1) cells is 6.2 ¦ m, and the value of W/L is 15.5. The transistor width can be reduced to 0.9 ¦ m. Reference [9] reports that the optimal value of W/L around 20. The transistor width of our library is smaller than the reported value.
A. Accuracy of Cell Delay Model
We first examine the accuracy of the cell delay model. We use INV, 2-input NAND and 2-input NOR cells of standard driving-strength(x1) for this experiment. In the case of NAND and NOR cells, we evaluate the characteristics of the input pin are excluded. When the absolute value of the delay time is extremely small, the relative error becomes meaninglessly large while absolute error is sufficiently small. We hence do not calculate the relative error when the delay time is less than 0.01ns. The size of look-up tables is 5x5x5x5. Table 1 
B. Power Optimization Results
We show the results of power optimization. The circuits used for the experiments are an ALU in a DSP for mobile phone [11] (dsp alu) and the circuits included ISCAS85 and
LGSynth93 benchmark sets (C3540, alu4, C7552, des). These circuits are synthesized under two different constraints [10] : minimizing the circuit delay, and minimizing the circuit area. Also two transition time constraints, 0.5ns and 1.0ns are given. Thus, each circuit is synthesized under four different constraints in total. We generate the layouts of the synthesized circuits and utilize the wire capacitance values extracted from the layouts for transistor sizing. The circuit scale is 943 to 12460 cells. The cell library used for generating initial circuits includes six varieties in driving-strength for INV and BUF (x1, x2, x3, x4, x6 and x8 ). In the case of NAND2, NAND3, AND2, AND3, NOR2, NOR3, OR2, OR3, AOI21, OAI21 cells, there are four varieties (x1, x2, x3, x4) . The circuit delay time is evaluated by a transistor-level static timing analysis tool [12] , and the power dissipation is estimated by a transistor-level power simulator [13] . The input patterns are randomly generated with a transition probability of 0.5. The number of applied patterns is 100, which is the adequate number for power estimation at circuit level [14] . The cycle time of the input patterns is 100ns.
We Table. 2 shows the power optimization results. The column "Total Width" represents the sum of the gate widths of MOSFETs in the circuit. "CPU Time" represents the CPU time required for power optimization on an Alpha Station. Our method reduces power dissipation by 77% maximum and 65% on average. The total transistor width is reduced to 25% of the initial circuits. The power reduction in small circuits is larger than the one in large circuits, because large circuits usually have heavier wire load. In the case of the largest circuit dsp alu, the power dissipation is reduced by about 50%. In some circuits, the circuit delay increases though the initial delay time is given as the delay constraints. One reason is that the optimized circuits become sensitive to the error of cell delay model [15] . Further examination of the reasons is required, considering the accuracy of the delay calculation tool as well.
We examine the optimization result of des circuit generated for minimizing circuit delay under the transition time constraint of 0.5ns. Fig. 3(a) shows a part of the initial layout. Fig. 3(b) corresponds to the transistor-sized layout of the same location. The transistor sizes inside cells become different in instance by instance. PMOS and NMOS transistors inside each cell are resized separately. Also the routing is perfectly preserved. Our method generates cell layouts on the fly according to the optimization results, and replaces cells without any interconnect modifications.
We first demonstrate the relationship between the amount of power reduction and the increase of driving-strength varieties. Halving þÿø ¡ £ ¢ in the optimization algorithm(Sec. II. D) corresponds to halving the intervals of driving-strength and increasing driving-strength varieties twofold. We classify the driving-strength varieties into 10 levels (Table 3) . Fig. 4 indicates the relationship between power dissipation and drivingstrength level. The power dissipation is reduced as the drivingstrength varieties increase.
We next show the distributions of transistor widths in the optimized circuit (Fig. 5) . The transistor width of a standard driving-strength(x1) cell is 6.2 Driving-strength level Power dissipation (mW) sum of NMOS gate widths(9.4mm). Fig. 6 expresses the slack distributions of the initial and optimized circuits. By transistor sizing, the number of the cells with 0 or almost 0 slack increases drastically. The sum of slack in the optimized circuit is 1241ns, whereas the sum of slack in the initial circuit is 3122ns. The total slack is reduced by 60%. We then demonstrate the capacitance reduction in the circuit (Fig. 7) . Our method does not modify any interconnects, so wire capacitance does not change. The gate capacitance of MOSFETs is reduced by 77%, which results in 61% reduction of the total capacitance.
We show the peak current reduction. We apply 100 input patterns, and evaluate the peak current at each time-step within a cycle. Fig. 8 indicates the peak current of the initial and optimized circuits. The horizontal axis represents the time within a cycle of 3.4ns. The peak current is reduced by 74%. Pathbalancing effect of our method contributes to the peak current reduction, as well as gate capacitance reduction. The transition timing of each cell is well distributed throughout a cycle. Reducing the peak current is effective to avoid IR drop problem. Also, the current reduction is a useful way to evade electromigration. The mean time to failure(MTF) of electromigration
is expressed as follows [16] . is the length, and T is a constant close to 2. The current reduction of 74% increases MTF 15 times. Thus, our method can increase the tolerance to IR drop and electromigration problems, and contribute to high-reliability LSI design.
We finally show the power optimization results when the initial circuits are generated using a low-power cell library. The delay time of each initial circuit is given as the delay constraint. The cell-height of this low-power library is 9 interconnect pitches, and the standard transistor size is 3.4U m. The results are shown in Table 4 . Even when the low-power cell library is used for initial circuits, our method reduces power dissipation by more than 50% on average.
C. Effectiveness of Interconnect Preservation
The proposed method optimizes a detail-routed circuit without any wiring modifications. We verify the effectiveness of the interconnect preservation. In a conventional transistor sizing method, the layout is modified using an ECO(Engineering Change Order) technique in order to preserve the placement and wiring as much as possible. But a certain amount of variation in wire capacitance is not avoidable.
We examine the effect of this capacitance variation statistically. We assume that the wire capacitance varies according to a normal distribution N(V , W ) because of interconnect modifications, i.e. ECO. The mean V is the initial value used in transistor sizing, and the standard deviation W is 20% of the initial value. The delay distribution is obtained using a Monte Calro technique. The number of delay evaluation is 10,000. Fig. 9 shows the delay variation in the optimized des circuit. As you see, the interconnect modifications increase the circuit delay. The circuit whose delay time is the same with the initial circuit(3.36ns) can be hardly obtained. The circuit delay of "mean+3W " is 3.60ns, which is larger than the delay without wiring modifications by 7%. The proposed method can avoid this delay increase, thanks to the interconnect preservation. 
IV. CONCLUSION
We propose a power reduction method that down-sizes MOSFETs in a cell without any interconnect modifications. The effectiveness of our method is experimentally verified using 5 benchmark circuits. The power dissipation is reduced by 77% maximum and 65% on average without delay increase. We verify that our method also contributes to high-reliability LSI design.
