Introduction
The synthesis of large-scale VLSI systems calls for high-performance design techniques. The merits of a high-quality design are fast switching speeds, small silicon area, and low power dissipation. To improve the switching speed of a particular circuit, block on the critical path, one may seek to increase the widths of the transistors in the block, but this increase in the transistor widths also increaces the capacitive loading of the preceeding block and may adversely affect, the overall circuit delay. Moreover, an increased current, drive for the present block, coupled with a slower transition-time at the output of the preceeding block, also increases the power dissipation of the circuit. Thus, the issues regarding the delay, the area, and the power dissipation are fairly interlinked and an optimization program attempts to yield a satisfactory tradeoff between them.
A variety of approaches has been suggested in the past for transistor sizing and performance improvement of VLSI systems. TILOS [l] has been used to size many practical circuits. In its search for the optimum, TI-LOS increases the size of the critical transistor by a fixed value during each iteration. But, t8here can arise situations where the sizes of the transistors along a crib ical path need to be reduced; TILOS is not very well characterization of the effective transition-resistances by finding proper regression coefficients. Such methods may be time and space consuming and may incorporate interpolation errors as well.
In this paper we introduce a transistor sizing tool, ASAP, that, minimizes the delay, the area, and the power dissipation (or a combination thereof) of a circuit. by optimizing the sizes of the gates on the N most critical paths of t,he circuit. The critical paths are obtained from a timing analyzer. In course of the optimization process, t,here may arise a situation where an alteration in the size of a component can reduce the delay of a specific critical path at the expense of increasing the delay of some other path(s). In such a case, the change is effected only if the resultant maximum delay of all the paths after the size change is less than the maximum delay before the change. We take a global picture of t,he circuit into account and refrain from making circuit, changes that actually worsen the overall performance.
The optimization technique in ASAP incorporates accurate analytical models in the formulation of the cost-function and uses closed-form equations for faster evaluation of t,he gate delay and the power dissipation. The delay and power approximations for an inverter are based on Sakurai's a-power law MOSFET model [7] . Static gates, other than the inverter, are first mapped to an equivalenta inverter [8] and then analyzed.
Our optimization algorithm can be tailored to yield suitable tradeoffs between delay, area or power by suitably choosing the weights for these parameters. Each weight, is a number in the closed interval [0, 11 and det,ermines the relative sensitivity of the optimization cost function to a change in the corresponding parameter value.
The paper is organized as follows. In Section 2, we clarify how the critical and the non-critical path capacitances are considered during the optimization process. Section 3 talks about, the area, the delay and the power calculations for st,at,ic CMOS gates. The details of the optimization procedure -its problem formulation and its solution technique -are given in Section 4. Some results and comparisons with a different optimization program are presented in Section 5. Section 6 draws the conclusion. 
Capacitance Considerations
The most critical path of a circuit is that particular path along which a signal-transition takes the maximum amount of time to propagate from a primary input to a primary output.' The maximum operating speed of a circuit is therefore limited by the criticalpath delay. The goal of our algorithm is t,o optimally size the transistors on a set of N critical paths of a circuit in order to reduce the circuit delay. The capacitive loading of a block, which is connected not only to a next block on the critical path, but. also fans out. to some other non-critical-path blocks, is extracted by a parasitic extractor. The set of critical paths, along with the interconnect and the fanout loadings, from the critical and the non-critical regions of tlie circuit, form the input to the optimization program. As an example, let us consider the circuit shown in Figure   1 . b -c -f -g -h is the critical path of the circuit. G1 fans out to G2 and G5, where G5 is in a non-critical region of the circuit. The transistors on the critical path, constituting G1 and G2, are assumed to contribut,e to a variable (depending on their widths) capacitance, whereas the loading from G5, along with the interconnect capacitance of the outputl net, of G1, forms a fixed capacitance. The fixed and the variable capacitances are both lumped at the output of G1 while modeling its delay during the optimization process.
Power and Delay Modeling
The delay modeling of an inverter involves the cslculation of not only the inverter delay but, also the transition-time at the output, of the inverter. IJsing Sakurai's equations [7] , we approximate the delay, the output transition-time, antl the short-circuit. power dissipation of an inverter. The total power is calculated as the sum of the short-circuit, power and the switching power. The latter is calculated as C L V~~.
For NAND and NOR gates, the gate is first mapped to an equivalent inverter which is then analyzed. The equivalent inverter for each gate depends on the type of the gate (ie. the gate struct,ure), the gate size ( i e . the width of the transistors), the number of inputs the gate has, the particular input that is switching, the switching transition (ie. whether the input to the gate is rising or falling), and some t3echnology-depentlentt parameters that determine the junction capacitances associated with the transistor terminals. The equivalentinverter mapping [SI consists of two parts -finding the equivalent-inverter width and calculating the modelingcapacitance at the inverter output. Our optimization goals, as mentioned before, include both area and delay minimization. Therefore, we need mapping mechanisms for both area and delay. We derive equivalentinverter area factors which are the ratios of the incremental gate area change to the incremental equivalentinverter area change. For example, the PMOS and the NMOS transistor area factors in the case of a NAND gate, for the kfh gate input ( i e . the input of the kth transistor Tk) switching, are given by: 
In the case of a complex gate, the gate structure is first, reduced to a series interconnection of transistors antl t,lien mapped to an equivalent inverter [SI.
Optimization Scheme
We use simulated annealing [9] for our optimization. The choice is motivated primarily by its flexibility in terms of the forms of the objective functions that it can handle. The complexity of the objective function arises from the need to handle delay, area, and power and also from the requirement of handling N critical paths simultaneously.
Any circuit can be assumed to consist of two parts (referred to as sub-circuits henceforth), critical and non-critical. Each sub-circuit comprises a set of interconnected elements. An element in our case is either a simple inverter or an equivalent inverter along with an equivalent, capacitance at its output. A 3-input NAND gate, for example, gives rise to three eqnivalent, inverters (one for each input). In course of the optimization, whenever we perturb any of these equivalent, inverters, we reflect this change in the sizes of the other two inverters, because a change in any of these invert,ers implies a resultant change in the size of the actual gate from which all these equivalent inverters have been derived. A perturbation of the width of an equivalent inverter also requires the necessary updating of the equivalent, capacitance at, the output of the inverter.
Even though we do 'look' at all of the N critical paths in order to determine the goodness of a solution, our perturbation space is limited only to the elements of the 'most, critical path'. This speeds up the annealing process immensely without affecting the quality of the result,. During each annealing iteration, the size(s) of the critical-path components may get changed antl there being the possibility of another path becoming most, critical because of these changes, the current 'most, critical path' is determined at the end of the iteration. The annealing process is then repeated on the new 'most critical path'. Such iterations are continued till a user-defined limit, on the number of iterations or a user-defined minimum improvement ratio, between successive iterations, is reached. 
%-age imvrovement
The cost function used can be expressed as: 
Results
Let us first consider the circuit shown in Figure 2 . Table l compares our delay-optimization results for the above example with the results obtained from a tlifferent circuit, optimization program [IO]. Using a timing analyzer it is found that the path c -e -f -g is most, critical for a rising transition at the input c. The delay value obtained after the optimization of the circuit. 11s-ing ASAP is about 21 % less than that, predicted by the DROID [ll] synthesis system -both programs having identical limits on the objective function and the minimum and the maximum allowable transistor sizes. It is to be noted, however, that, when we refer to a particular path, we actually imply two sub-paths, one for each transition (rising and falling) at, the input, of the path. The delay of a critical path refers to the maximum of the delays of the two sub-paths. As mentioned earlier, the relative importance of power and/or delay optimization for a circuit can be weighted in ASAP. For the above circuit, Table 2 shows the power-delay tradeoff for several different. weight factors. For an explicit illustration of the tradeoff, the area weight, factor is set to zero in this case.
A full-adder circuit is shown in Figure 3 . For this circuit, Table 3 shows that. our optimized-circuit, delay is about 11 % better than that predicted by DROID. Just, like the tradeoff between power and delay, a suitable Figure  4 , Table 5 compares our optimization results with that of DROID. The c,ircuit of Figure 4 can also be used to illustarate t,he necessity of sizing multiple critical paths at. the same time. The results for single and multiple-critical-path optimizations are shown in Table 6. The initial-setting column shows the delay values obtained from a preliminary design. As is evident, from the table, if we consider only one critical j'ath for the optimization, the delay of the critical p t , h ( e -y -t -p -r -out3 -out4) is reduced to 1.69 ns, but, the delay of the second critical path ( e -y -t -o -r -out3 -out4) increases to 1.88 ns.
A consideration of multiple critical paths leads to an overall circuit, delay of 1.77 ns. 
IV
Figure 4: Combinational logic with multiple fanout. 
