A new performance and area optimization algorithm for complex VLSI systems is presented. It is widely believed within the VLSI CAD community that the relationship between delay and silicon area of a VLSI chip is convex. This conclusion is based on a simpli ed linear RC model to predict gate delays. In the proposed optimization algorithm, a nonlinear, non-RC based transistor delay model was used which resulted in a non-convex relationship between the delay and the silicon area of a VLSI chip.
Introduction
The techniques for performance and area optimization of VLSI systems can be divided into two categories. One is to change the circuit structure by re-synthesizing or re-timing the target system. The other is to change the transistor sizes of the circuit so that the driving and load conditions in the circuit is optimal. The latter approach does not involve any topological changes and is often a preferred to as \transistor sizing".
Transistor sizing is to nd an optimum set of transistor sizes in a circuit so that the circuit performance and/or circuit area are optimized. The size of a transistor includes two components: the transistor channel length, L, and the channel width, W. Because the transistor channel length is often xed to its minimum value, varying transistor sizes to change the circuit performance is often accomplished by varying the transistor widths. The objective function used most often has been f(A, T) = A T, where A is the total active area and T is the longest delay of the circuit.
Most of the existing transistor sizing algorithms 1, 4, 8, 2] uses linear RC delay models in the timing analysis. Fishburn and Dunlop 2] concluded that the relationship between delay and size of a VLSI system is convex. More recently, Sapatnekar 3] and Dunlop 9] formalized the assumption of convexity and further improved the performance of their transistor sizing algorithms. However, these algorithms based on RC delay models su er from a number of drawbacks. Firstly, the RC-based delay models lead to inaccurate estimations of circuit delays. 20%-30% deviation from the SPICE simulation in predicting delay by RC-based delay model is expected 13]. Secondly, these algorithms are calculus-based algorithms which have di culties in optimizing a discrete search space. Even though Shyu 1] made improvements in dealing with the non-di erentiality by expanding the de nition of \gradient" to the extent of both di erentiable and non-di erentiable points, the algorithm in 1] were reported for having convergence problems on complex circuits and having di culties in automatically nding an optimum parameter set. And nally, the convex relationship between the circuit size and the circuit delay may not hold if a more accurate, non-RC delay model is used as our experiments have shown 20]. Delay calculation proposed by Hofman and Kim 12] is accomplished by a lookup table. The table-lookup method for delay calculation does not have the inaccuracy problems of RC models, but it often su ers from in exibility in adapting to di erent technologies.
We used an analytic delay model 10] which is similar to the analytic delay model described by Weste and Eshraghian 11] . The main di erence is that our delay model takes the input slew rate into account resulting in a more complex but more accurate delay model. Our analytic delay model has the delay prediction accuracy between 0.5% and 2% over a wide range of input slew rates, transistor sizes and output load capacitances compared to SPICE simulations.
The research presented in this paper is motivated by the fact that the relationship between the circuit delay and the circuit size may be non-convex if a more accurate delay model is used rather than the simple RC model. We present a VLSI performance and area optimization algorithm by nding a set of optimum transistor sizes for a given VLSI CMOS circuit. To e ectively search for an optimal solution, a chain-like geno-structure, referred to as a chromosome, is used to emulate a circuit path structure. The information of the size of a gate on the given path is represented by the encoding of each gene on the chromosome. Search for global optimum is facilitated using the genetic algorithms 7]. Because the optimization does not take interconnects into consideration, it is mainly used for pre-layout optimization.
The experiments on the ISCAS benchmark circuits show that a substantial amount of improvement can be achieved in circuit delay, and very often, in both circuit delay and circuit size. The remaining part of this paper is organized in 4 sections. In Section 2, a new critical path selection strategy used in our timing optimization is introduced. The organization of the timing and area optimization algorithm based on the genetic algorithms are described in Section 3. Experimental results on the ISCAS benchmark circuits are discussed in Section 4. Finally, in Section 5, concluding remarks and the future work are presented.
Critical Path Selection
Extracting critical paths from a given circuit is the rst step before timing optimization can be performed. In timing and area optimizations, the choice for optimum set of transistor sizes often cannot be determined soly on one critical path concerned. The close interdependency between di erent paths through shared gates plays an important role in both timing and area optimization. We refer the gates not shared by more than one path as \unique gates". By optimizing only the unique gates in a given critical path, the interdependency between critical paths is eliminated, thus, simplifying the optimization process.
The selection of critical paths for timing and area optimization is based on the following rules:
Select critical paths according to their rankings in path delays, starting from the most critical path.
Select only the paths which have at least one unique gate. Table 1 shows the number of unique gates for each of the 10 longest critical paths in the ISCAS benchmark circuit C432. The most critical path has 17 unique gates. The next critical path has only 3 unique gates. The number of unique gates in the rest of the critical paths decreases with the increase in the number of identi ed critical path. This situation is very common among most of the ISCAS benchmark circuits.
The advantages of critical path selection based on both the path length and number of unique gates are twofold:
The identi cation of unique gates among the critical paths allows us to dramatically reduce the search space for optimizing a given circuit, and also enables us to isolate di erent critical paths during optimization. Such a simpli cation, though may not yield a globally optimized solution, will signi cantly reduce the complexity of the optimization problem.
In general, by ranking the critical paths, longer critical paths are optimized rst, xing the unique gates in the shorter critical paths, thus, reducing the computational complexity of the shorter critical paths without a ecting the longer critical paths. Table 2 shows the number of critical paths with at least one unique gates among 10,000 critical paths extracted for each of the ISCAS benchmark circuits.
A Transistor Sizing Algorithm Based On Genetic Algorithms
Genetic algorithms (GAs) are discrete, probabilistic optimization algorithms. The problem of transistor sizing consists of two parts: timing analysis to extract critical paths from a circuit; and optimization of transistor sizes on the extracted critical paths. We map the transistor sizing problem to a problem for the genetic algorithms as follows:
An extracted critical path is treated as a chromosome. The length of the chromosome is determined by the number of genes on the chromosome.
A gate on a critical path is treated as a gene of the chromosome. A gene is encoded as an integer which represents the size of the corresponding gate on the critical path.
Assuming only CMOS gates are used and the risetime and falltime of the output is balanced, the size of a gate is represented by the e ective transistor channel width of the n-tree in the gate. For instance, a 2-input NAND gate, the e ective transistor channel width of the n-tree is half of the n-type transistor channel width.
The optimization process is divided into the following steps:
1. Extract the critical paths and rank them in the order of path delays.
2. Take one of the critical paths which does not satisfy the delay requirement and generate the initial chromosome population.
3. Calculate the load capacitance of every gate on the critical path according the gate sizes. 
where A is the total active area and T is the path delay. The input slew rate of a gate is approximated to the risetime or falltime of the previous gate.
5. Calculate the average tness value for the entire population.
6. If at least one of the chromosomes representing the corresponding path satis es the timing and the area constraints, or the improvement on the average tness value of the entire population is within a speci ed threshold, the optimization process is completed. Otherwise, take two chromosomes from the population according to a set of selection criteria related to their tness values, and perform crossover or mutation operations to generate a pair of new chromosomes, i.e. a pair of new path con gurations.
7. If the new paths are better in terms of their tness values, they are kept in the population and two existing chromosomes are eliminated from the population according to a set of criteria. Otherwise, the new paths are eliminated.
Go to Step 3.
The above optimization algorithm is illustrated in Figure 1 . The critical path of a given circuit is rst extracted as shown in bold lines. In Figure 1 , we assume that there are 17 gates on the critical path b-to-y. The size of each gates in the critical path is then initialized to a random size and the size of a gate in the critical path (an integer) is represented in a chromosome structure according to the order of the gate in the critical path. Many such chromosomes are generated in the same way to form an initial population. Genetic algorithms are then applied to the population to obtain the nal optimized gate sizes which is shown in Figure 1 as the optimized chromosome.
There are several parameters which can determine the quality of the optimization using genetic algorithms. These parameters include the population size, P size , the probability of using the crossover operator, P xover , and the probability of using the mutation operator, P mut . The population size has the e ects on convergence behavior as well as the nal results. Typically, the larger the population, the better the quality of the nal results and the longer it takes to reach the nal results. The probabilities of applying the crossover and mutation operators also have some impact on the quality of the nal results. The optimal values of these probabilities need to be experimented.
The transistor sizing algorithm based on the genetic algorithm was implemented in C programming language with about 20,000 lines of code.
Experimental Results
The transistor size optimizer was applied to the ISCAS85 benchmark circuits. A commercial 1:5 m CMOS technology was used as the target technology. We compared the optimized results to two con gurations. Under the rst con guration, all the transistor sizes are set to the minimum size allowed by the technology. Under the second con guration, all the transistor sizes are set to the sizes of a commercial standard cell library as if these benchmark circuits were implemented using the standard cell library. In most cases, our transistor size optimizer was able to nd better solutions in terms of the objective function. In some cases, the circuit size and the circuit delay were both reduced compared to two con gurations. The parameters for the genetic algorithms chosen for the experiments were: population size = 100, P xover = 0.6, and P mut = 0.1. The rationale for choosing these parameters will be discussed later in this section.
In order to compare the optimized results with two di erent con gurations, we de ne the gure of merit (FM) as:
FM( gure of merit) = initial delay or size -optimized delay or size initial delay or size (2) Table 3 shows the comparison between the optimized ISCAS85 benchmark circuits and the minimum size con guration. The results are very much expected. We reduced the circuit delay at the cost of the increase of circuit size in 8 out of 10 cases. For C1355, our optimizer could not get any reduction in delay even if the the circuit size is dramatically increased. Our optimizer ran out of memory during the optimization for C6288. Table 4 shows the comparison between the optimized ISCAS85 benchmark circuits and the standard cell library implementation. In 7 out of 10 benchmark circuits, we achieved reduction in both the delay and the size of the circuits. The reduction in circuit sizes ranges from 4.6% to 31.3%. At the same time, 2.4% to 22.5%. For C499 and C1355, the circuit sizes were reduced at the cost of lesser amount of increase in circuit delay. This evidence demonstrates that the relationship between circuit size and circuit delay may not be convex as many others have originally believed.
During the optimization, we also kept track on the number of unique gates which increase in size or decrease in size as compared to the initial con guration. They are shown in the column \% of gates" with \size increase" and \size decrease" in Table 4 . This is an indication of whether the genetic algorithm has successfully identi ed a small amount of critical gates to improve the circuit delay. For most cases, the amount of gate increase in size is smaller than the amount of gate decrease in size.
We also experimented various parameters for the genetic algorithm such as population size and the probabilities of performing crossover and mutation. Figure 2 shows the objective values, A T, of a 17-gate path in C432 during the optimization in which genetic algorithms updates the generation of the population as new and better chromosomes are generated. The population size varied from 10 to 10,000 chromosomes. It is obvious that the objective values with larger population sizes are better than those with smaller population sizes. However, larger population size requires longer computation time to update each population. Our experiments show that the di erence between the results with population size of 100 and the results with population size of 1000 and 10,000 are similar in terms of their objective values after a certain number of generations. This phenomenon is very typical in other benchmark circuits. Figure 3 shows the same relationship for C7552 with the path length of 43 gates.
The probability of performing crossover and mutation is less well behaved as compared to the population size in terms of nding the best value. Choosing the best value for a given circuit requires a certain amount of experiments. However, varying these two parameters does not have a signi cant impact on the nal results as long as they are within a certain range. Figure 4 and 5 shows the relationship between the objective values and the population generated by the genetic algorithm under various probabilities of performing mutation, P mut . As these gures indicate, no de nite conclusion can be drawn from the experiments. Our choice of P mut is 0.1.
Although the delay model in 10] has taken the load capacitance into consideration, the load capacitance contributed by on-chip interconnects was not included in the experiments presented in this paper due to lack of layout information. The load capacitance contributed by the fanout gate capacitance was, however, included in our experiments. This is the rst time, to the best of our knowledge, that the entire set of ISCAS85 benchmark circuits has been used for evaluations. All previous publications on transistor sizing have given experimental results based on several non-public domain circuits to which we do not have access. Therefore, direct comparison of our results and previously published results is not possible.
Conclusion
By using a non-RC-based delay model, we have shown that the relationship between the delay and the size of a given circuit may not be convex as many others originally believed. Therefore, we generalized the transistor sizing problem as a non-convex optimization problem and used genetic algorithms to search for global optimum for both the delay and the size of a circuit. The results show that the genetic algorithms have several advantages over traditional calculus-based algorithms, such as the ability to search for global optimum in a discrete, non-convex search space. By using the proper path selection method, the search space can be greatly reduced to make it feasible for optimizing a large complex VLSI chip. We are specially encouraged by the results that show the opportunities to reduce both circuit delay and circuit size for a large of VLSI circuits of the transistor sizes are chosen properly.
One problem with our current experiments is that it fails to consider the impact of onchip interconnects. As IC technology advances to deep-submicron feature size, the impact of on-chip interconnects on the overall chip optimization will be greatly increased. Our future work will include incorporating layout information about on-chip interconnects by back-annotating such information to the gate level. In addition, area estimation will include both the active area in terms of transistor sizes and the routing area. Table Captions   Table 1 . The distribution of unique gates for the 10 most critical paths in C432. Table 2 . The number of critical path with at least one unique gate in the ISCAS85 benchmark circuits. Table 3 . Results of the optimized ISCAS85 benchmark circuits with the minimum initial size for all the transistors. Table 4 . Results of the optimized ISCAS85 benchmark circuits with the initial implementation using a commercial standard cell library.
