Abstract-As the semiconductor technology node scales into the deep submicrometer regime, it has become very difficult to obtain high IC yields because the process-voltage-temperature variations induce large spreads in delay and power. In this paper, we propose a new framework, called GenFin, which is, as far as we know, the first to target the multiobjective yield optimization of logic circuits. Since FinFETs are a promising substitute for CMOS at 22-nm technology node and beyond, we evaluate the framework with a 22-nm FinFET logic library. By combining the power of genetic algorithm (GA) and adaptive multiobjective optimization, GenFin produces a set of nondominated logic circuits whose timing, leakage power, and dynamic power yields are simultaneously optimized. This can help designers make tradeoff decisions wisely and avoid suboptimal solutions. We also propose an incremental statistical circuit analyzer, called incremental FinPrin, that speeds up the statistical static timing analysis by up to 9.6× and the statistical power analysis by up to 2235.7×, while incurring errors of only up to 0.031% in mean and 0.74% in standard deviation relative to nonincremental analysis. We use heuristics based on the deterministic timing analysis and gate criticality to reduce the GA search space and also improve the quality of its solutions. We present extensive experimental results to demonstrate the efficacy of GenFin.
GenFin: Genetic Algorithm-Based Multiobjective Statistical Logic Circuit Optimization Using Incremental Statistical Analysis Aoxiang Tang and Niraj K. Jha, Fellow, IEEE Abstract-As the semiconductor technology node scales into the deep submicrometer regime, it has become very difficult to obtain high IC yields because the process-voltage-temperature variations induce large spreads in delay and power. In this paper, we propose a new framework, called GenFin, which is, as far as we know, the first to target the multiobjective yield optimization of logic circuits. Since FinFETs are a promising substitute for CMOS at 22-nm technology node and beyond, we evaluate the framework with a 22-nm FinFET logic library. By combining the power of genetic algorithm (GA) and adaptive multiobjective optimization, GenFin produces a set of nondominated logic circuits whose timing, leakage power, and dynamic power yields are simultaneously optimized. This can help designers make tradeoff decisions wisely and avoid suboptimal solutions. We also propose an incremental statistical circuit analyzer, called incremental FinPrin, that speeds up the statistical static timing analysis by up to 9.6× and the statistical power analysis by up to 2235.7×, while incurring errors of only up to 0.031% in mean and 0.74% in standard deviation relative to nonincremental analysis. We use heuristics based on the deterministic timing analysis and gate criticality to reduce the GA search space and also improve the quality of its solutions. We present extensive experimental results to demonstrate the efficacy of GenFin.
Index
Terms-FinFETs, genetic algorithm (GA), multiobjective optimization, Pareto rank, process-voltagetemperature (PVT) variations, statistical analysis, yield analysis.
I. INTRODUCTION
T HE never-ending demand for more powerful ICs has driven semiconductor scaling to the deep submicrometer regime, making it possible to place billions of transistors on a single die. FinFET technology has begun replacing the traditional CMOS technology, owing to better control over the short-channel effects (SCEs) [1] . However, the process-voltage-temperature (PVT) variations continue to pose a major challenge to VLSI designers [2] - [8] . Uncertainty due to variations arising from the manufacturing process and environment makes it extremely difficult to predict the circuit performance. Despite the advantages FinFETs enjoy, they are still susceptible to PVT variations [6] , [8] . These variations are mainly caused by lithographic constraints and difficulties in gate workfunction engineering, which are manifested as large spreads in power and delay. PVT variations have been extensively studied from the device level to the architecture level [4] , [5] , [7] , [9] .
Tackling PVT variations necessitates the development of statistical circuit analysis and optimization techniques. Statistical analysis techniques estimate the distributions of various circuit performance metrics using concepts from probability theory and statistics. From the cumulative distribution functions (CDFs) of the circuit metrics, the designer can determine the probability that an IC satisfies the design constraints. This probability is also called circuit yield. Statistical optimization techniques enable tradeoffs between circuit timing and power, so that the various yields can be optimized. These techniques may perform gate sizing in a statistically aware manner in order to optimize yield.
A lot of work has been done on IC statistical analysis. They basically fall into two main categories: 1) timing analysis and 2) power analysis. The conventional static timing analysis (STA) addresses the timing analysis problem by analyzing multiple process corners. However, this is generally considered to be inadequate because only a small number of process corners can be chosen in order to be computationally efficient. Thus, such techniques are very conservative and pessimistic. Hence, researchers have proposed statistical STA (SSTA), which treats delays as probability density functions (pdfs) instead of fixed numbers [10] - [15] . Chang and Sapatnekar [10] , [11] present an SSTA technique that computes delay distributions, while considering spatial correlations, using a PERT-like circuit traversal. Visweswariah et al. [12] propose a block-based incremental timing analysis framework using a canonical first-order delay model.
Whereas most of the attention has been paid to timing analysis, a few researchers have also investigated power variations under the effect of PVT variations [16] - [18] . Mukhopadhyay and Roy [16] present an analytical model to estimate leakage variations due to gate tunneling and reverse-biased source/drain junction band-to-band tunneling, as well as due to correlations among these components. Chang and Sapatnekar [17] propose a method that treats leakage as a lognormal random variable (RV). While the above methods evaluate CMOS circuits, Yang and Jha [18] present a FinFET logic library that can be used to model timing, leakage power, and dynamic power distributions, based on augmentation of the framework presented in [10] , [11] , and [17] .
Statistical optimization techniques, on the other hand, are not well investigated. Most of the work on circuit optimization employs the classical sensitivity based gate sizing algorithm. The statistical timing optimization method in [19] uses an objective function that combines a profit function with the delay PDF. Guthaus et al. [20] propose a criticality guided the statistical timing optimization method that sets a criticality threshold for selecting candidate gates. Hwang et al. [21] present a way to compute gate swap sensitivity based on direct use of timing yield and yield slack. However, sensitivity based methods may get stuck in the suboptimal points because greedy heuristics only allow the gate with the maximum sensitivity to be replaced [22] . Second, power and timing are not statistically optimized simultaneously. Typically, one metric acts as the primary objective, while the other is either ignored or acts as a secondary objective. Third, dynamic power, interconnect delay and power, as well as spatial correlations are ignored. Last, but not least, all previous methods produce only one best solution. Thus, designers do not have an opportunity to make tradeoffs between various metrics. It is quite possible that the designer may choose another solution if it improves one metric by a large extent, with only a small penalty in the other. Yang and Jha [18] report that, compared with the best timing solution for a FinFET logic circuit, we can achieve up to 60% power reduction with 10% timing slack. However, this result is based on the deterministic circuit optimization.
To address the limitations mentioned above, we propose a framework called GenFin. It uses genetic algorithm (GA) to simultaneously optimize timing, leakage power, and dynamic power yields through gate sizing. GA has found wide popularity as an optimization algorithm due to its unique advantages [23] . It mimics nature's evolution process by enabling the best chromosomes to survive. The chromosomes have a chance to share information with each other. GA has been studied in the context of hardware-software cosynthesis problems [24] , [25] and a few works dedicated to the deterministic circuit optimizations [26] - [28] .
By incorporating an adaptive multiobjective ranking approach within GA, GenFin is able to explore a pool of solutions to optimize various competing costs simultaneously, instead of just collapsing all costs into one using a weighted sum approach, as is often done. It produces a set of nondominated solutions that provide a fuller picture of the solution space to enable designers to make wise tradeoffs. GA is computationally demanding since it may have to evaluate thousands of potential solutions. Thus, performing the statistical timing/power analysis incrementally would be beneficial to reduce the computational burden. We show how our prior statistical analysis framework, FinPrin [18] , can be turned into an incremental analysis framework, without much loss in accuracy. This makes the design search space exploration much more efficient. Currently, GenFin does not support common path pessimism removal on the clock path, which entails removal of artificially induced pessimism between a launch and a capture flip-flop pair during timing analysis [29] . However, this could be addressed in GenFin later by setting the correlation of RVs representing the components on the common path to 1.
GenFin has the following features. 1) As far as we know, GenFin is the first methodology and tool that can perform the multiobjective (timing, dynamic power, and leakage power) yield optimization of logic circuits. 2) It contains a novel incremental statistical timing/ power analysis methods to speed up statistical optimization.
3) It adopts an adaptive multiobjective ranking to help escape local minima. 4) It considers spatial correlations, wire variability, and circuit place-and-route information. 5) It employs statistical PVT models of FinFET logic gates for evaluation purposes. The remainder of this paper is organized as follows. Section II introduces background material related to this paper. Section III discusses the GenFin framework, detailing the various algorithms. Section IV presents the experimental results. Finally, the conclusion is drawn in Section V.
II. BACKGROUND
In this section, we introduce basic concepts related to the PVT variations of FinFET devices, statistical analysis, and GA. Fig. 1 shows the geometry of a FinFET device, which employs a thin fin as the channel body. L G , T OX , T SI , and H FIN denote the gate length, oxide thickness, fin thickness, and fin height, respectively. The physical parameter values for a 22-nm FinFET device are listed in Table I . FinFET is a nonplanar double-gate device that provides tight control over SCEs [30] . Besides, it has a higher ON- In addition to process variations, the supply voltage V dd may also vary across the circuit, because of an unbalanced power distribution and IR drop. The V dd drop may be as high as 15% across a 8-mm × 8-mm IC [31] . V dd may also be impacted by a poorly designed voltage regulator. Since V dd variations impact both I ON and I OFF , they lead to variations, although to very different degrees, in timing, leakage power, and dynamic power.
A. PVT Variations
Temperature is another important factor whose variations primarily impact leakage current and timing [18] . Since the circuit temperature may settle at any value, we do not model temperature as an RV, but as an independent parameter that has no correlation to PV variations.
B. Statistical Modeling
As mentioned in Section II-A, there are various process variation sources in FinFET devices and wires. We assume that the process parameters are spatially correlated and modeled as Gaussian RVs. First, consider an arbitrary function s = F( − → P ), where − → P represents a set of RVs. s may represent delay, dynamic power or leakage power is a function of various PV parameters. The first-order Taylor expansion can be used to express the function as ). Hence, the mean and variance of s can be expressed as
where cov(·) denotes the covariance of p i and p j . Spatial correlations are modeled using a multilevel quadtree [32] . The correlation between two logic gates is related to the grids where the gates are located. Gates in the same grid are assumed to have a perfect correlation. However, as the distance between the grids increases, correlation decreases accordingly.
For wire delay, we use the Elmore delay model, expressed as follows:
where R w is the wire resistance, C w is the wire capacitance, and C in is the input capacitance of the gates connected to the receiver end of the wire. Gate delay is modeled based on a modification of the well-known Horowitz delay approximation [33] as follows [18] :
where S in is the input slope, τ = R tot · C tot is the time constant of the gate when S in = 0, R tot is the sum of R out of the gate and R w of the fanout wires, C tot is the total load capacitance which is the sum of its C out , C w of the fanout wires, and the C in of the gates in the following stage, and α d and β d are scaling factors that are assigned different values for different gates and sizes for the rising/falling transition. The derivative of gate delay is
Hence, we can actually decompose
The input slope of the gate is derived from the output slope of the previous gate. According to the empirical model proposed in [18] , the output slope can be expressed as
where S base = ln 9 · τ is the output slope when S in = 0, and α s , β s , and γ s are scaling factors derived for each gate type and size in the FinFET logic library. Similarly, as in the case of (6), by taking the derivative of S out , we can get the derivative of S in for the next gate. Then, we can again decompose [35] .
Dynamic power is modeled using the classical form
where α is the average switching activity, C is the total capacitance, and f is the frequency. Following an approach similar to the one adopted for delay, sensitivities of dynamic power with respect to the PVT parameters can be decomposed
Leakage power is treated differently from delay and dynamic power because the relation between the leakage power and the PVT variables is not polynomial but exponential [18] . It is modeled as a lognormal variable. The exponent of a lognormal variable is an RV. The leakage power is given as
where a i is a coefficient. We directly obtain leakage sensitivities from TCAD simulations.
C. Statistical Operations
In SSTA, two statistical operators are commonly used, statistical sum and max. Suppose X and Y are two normal RVs,
, with a correlation coefficient r . The statistical sum S = X + Y is a normal distribution with mean and variance expressed as follows:
The sum of lognormal distributions is not known to have a closed form. However, it may be approximated as a lognormal variable using Wilkinson's method [17] . The sum of t lognormal variables S = t i=1 e Y i is approximated as the lognormal e Z , where Z ∼ N(μ Z , σ Z ). To get the mean and the standard deviation of Z , u 1 and u 2 are required
where r i j is the correlation between Y i and Y j . Now, μ z and σ z can be derived from
Statistical max is often used in gate delay computation. The mean and the variance of T = max(X, Y ) can be approximated as follows [11] : where
If σ X = σ Y and r = 1, the result of statistical max operation is simply the RV with the larger mean value.
If there is another RV Z with a correlation coefficient r 1 with X and r 2 with Y , then the correlation r 3 between Z and T = max(X, Y ) is given by
Therefore, by using the above equations, we can obtain the max of any number of RVs.
D. Arrival Tightness Probability and Gate Criticality
Next, we discuss two important concepts in SSTA: 1) arrival tightness probability (ATP) and 2) gate criticality [12] .
Gate delay varies with the input slope, as indicated by (5) . Suppose X and Y are the arrival times at the output of a NAND gate. In SSTA, time is treated as an RV. Then, the ATP of X is the probability that X determines the arrival time at the output, or the probability that X is larger than
where r XY is the correlation between X and Y . When there are more than two inputs, the ATP of each is the probability that it is larger than the others. The gate criticality is the probability that the gate lies on a critical path. Consider the circuit shown in Fig. 2 . It has three primary inputs and two primary outputs. The primary outputs are connected to a virtual output S. The gate criticality is the sum of the products of the ATP of its fan-out wire and the criticality of the next-level gate driven by the wire. For example, in Fig. 2 , ATPs are shown next to gate inputs and gate criticalities, using bold italic fonts, next to the gates. Gate C has only one input. Hence, its ATP is always 1. The criticality of virtual output S is always 1 because every path 
E. Genetic Algorithm
GA is an evolutionary algorithm that mimics nature's evolution process. The fundamental concepts of evolution, such as inheritance selection, mutation, and crossover are expressed as genetic operators. These operators are used to improve the quality of solutions over several generations [23] . The solution pool is called the population.
In GA, a solution is represented by a chromosome, or an array of values. Each value carries one bit of information. To start a GA search, we need to generate an initial chromosome solution pool. Distribution of the solutions evenly in the design space is important since it determines GA's speed and efficiency [36] . The next step is to select the parent chromosomes for reproducing the chromosomes for the next generation (referred to as selection and reproduction, respectively). Through partial exchange of genes, the children chromosomes may inherit good features from both parents. This is called crossover. The mutation step then randomly changes some values in the new chromosome. Each value is mutated with a certain probability. This probability is called the mutation rate. Both crossover and mutation play an important role in GA. Crossover makes sure that good chromosomal genes can be shared and inherited, whereas mutation introduces new genes that may turn out to be useful in the evolution process (however, too much mutation may not lead to introduction of good solutions into the pool; hence, the mutation rate is typically kept low). Fig. 3 shows an example of crossover and mutation. The bold line cuts the two chromosomes A and B into two halves. They yield two new chromosomes A and B when the second half is exchanged. Then, random bits (shown in bold) are mutated. The final two chromosomes, A and B , become members of the next generation.
The above description depicts a very basic GA scenario. Several variations have been proposed based on modification of the genetic operators. For example, some algorithms use multiple-cut crossover and some select more than two parents for mating. GenFin combines the power of both a conventional GA and the generalized simulated annealing [24] . It is capable of varying the probability of chromosome selection for reproduction. At first, GenFin gives all chromosomes in the pool an equal chance for selection. In later generations, it gradually becomes greedier so that better chromosomes have a higher chance to be selected. The details are presented later in Section III-A.
F. Multiobjective Optimization
In logic circuit design, we are generally interested in optimizing more than one metric, e.g., circuit delay, leakage power, and dynamic power. Thus, it is a multiobjective optimization problem. Often a weighted sum of the various design metrics, with the weights based on the metric's importance, is used to solve the optimization problem. For this method to be successful, the weighting vector must be appropriate for the problem as well as designer's desired solution. However, it is extremely difficult to find the best weighting vector without knowing the nondominated solutions (solutions that can only be improved in one area by being degraded in another).
Suppose we want to optimize two conflicting metrics, delay and power. In Fig. 4(a) , the circles denote nondominated solutions. Each dashed line represents a cost function associated with a weighting vector. The best solution found by each dashed line lies near the first intersection with the nondominated solution curve, obtained by sweeping the dashed line outward from the coordinate origin. The weightedsum approach may find solution I but the designer may prefer solution H because of its much-improved delay at the expense of only a small power penalty. We also see that no matter which weighting vector is used, numerous and potentially valuable solutions on the curved line between the two doublearrow lines may be missed by the weighted-sum method. Even when using nonlinear weighted-sum methods, it is not possible to capture the shape of the curved line that has the nondominated set.
One way to address the above problem is through the use of Pareto rank. The Pareto rank of a solution is defined as the number of other solutions in the pool that do not dominate the solution. A solution dominates another if it is better in at least one metric and no worse in others. Fig. 4(b) shows an example where the Pareto rank of the solution is marked inside the circle. In this example, the ideal solution is one that gets as close to the origin as possible. Hence, any solution that lies in the top right shaded area will be dominated by a solution at its lower left corner, e.g., F is dominated by D. There are only two solutions, E and G, that do not dominate F. Hence, the Pareto rank of F is 2.
The best solutions, i.e., the solutions with the highest rank, form the nondominated set. These highest-ranked solutions enable the designer to choose one with the preferred tradeoff among the metrics.
III. ALGORITHM DESCRIPTION
In this section, we first describe the GenFin framework, including the details of the GA implementation. We then discuss how incremental statistical timing/power analysis can improve the efficiency of GenFin. Finally, we discuss heuristics to efficiently reduce the GA search space. Fig. 5 shows the evolution cycle of GenFin. A chromosome array is first formed based on the logic gate sizes. Two pools of solutions are maintained by GenFin: 1) evolution pool (PoolA) and 2) collection pool (PoolB). The evolution pool traverses the evolution cycle and the collection pool collects the best solutions from PoolA after each evolution cycle is complete. It is possible that the best solutions collected from a later evolution cycle are better than previously collected ones. Hence, the dominated solutions need to be removed from PoolB.
A. GenFin Framework
Before the evolution cycle commences, PoolA needs to be initialized with a number of preliminary solutions. GenFin evaluates each solution by calling the incremental statistical circuit analyzer, called incremental FinPrin, which is described in Section III-B. Each solution's Pareto rank is computed in a 3-D space consisting of timing, leakage power, and dynamic power as the three axes. GenFin then selects the best solutions from PoolA, places them in PoolB, and removes the inferior solutions, i.e., those dominated by others, from PoolB. If the halt condition is not met, a number of low-rank solutions are replaced by solutions selected from PoolA. Then, crossover and mutation operators are applied to solutions in PoolA and new solutions for the next generation generated. The number of solutions replaced, number of crossovers, and number of mutations per generation can all be specified by the user. The halt condition is that GenFin terminates when a given number of successive generations do not contribute to PoolB, i.e., do not yield better solutions than those found before.
1) Evaluation and Ranking:
Solution evaluation means computing the solution's costs. In statistical analysis, each cost is a distribution represented by two parameters: 1) mean and 2) standard deviation. GenFin targets three costs: 1) timing (t); 2) dynamic power (d); and 3) leakage power (l).
Suppose v is one of the above three costs and C is the corresponding design constraint. We define vs = v − C. Then, the corresponding yield is expressed as follows:
where μ and σ are the mean and the standard deviation of the corresponding RV, respectively. (·) denotes the CDF of the standard normal distribution. Therefore, a larger value of (−μ vs /σ vs ) indicates a larger yield. Since GenFin is aimed at optimizing yield, it may seem preferable to rank solutions directly in the 3-D space made up of timing, leakage power, and dynamic power yields as the three axes. However, this is not the best approach because the CDF curves are not linear. It is very easy to push the yield from 50% to 60%, but extremely hard to push it from 99% toward 100%. Based on (24), we define
Hence, Yield = (−Distance). Distance can be mapped to yield through the CDF of a standard normal distribution. Fig. 6 shows the graph of yield versus distance where the y-axis denotes the yield and the x-axis denotes the distance.
Three points are shown on the graph. The yield of A, B, and C are 96%, 99%, and 99.99%, respectively. Though yields of B and C are closer to each other, it would take much more effort to improve B to C than to improve A to B. The distance between B and C is longer than that between A and B. Hence, we use the three distances for solution ranking instead of the three yields because the distance better reflects the resources controlled by the solution.
2) Selection and Reproduction: After all solutions are ranked, a prespecified number of solutions are replaced, starting from the lowest-rank one, to make space for the reproduced ones. GenFin uses a variable called fairness to control selection behavior. A higher fairness implies that selection does not discriminate heavily against solutions with lower Pareto ranks. This is essentially done by controlling the solution selection probability. As an evolution cycle repeats, fairness decreases by a prespecified ratio called f dr. The reason for introducing the concept of fairness is to mimic simulated annealing in letting GenFin escape local minima at the start of GA cycles. Then, GenFin becomes greedier gradually in order to enable solutions to converge. Fig. 7 shows how the PDF changes theoretically during evolution assuming a population size of 100. The fairness is initially set to 3. The PDF for generation 1 clearly indicates that each solution has a fair chance to be selected. At generation 50, the bias toward solutions with higher Pareto ranks increases and at generation 90, the bias becomes quite significant. We see that solutions with a Pareto rank less than 75 have almost no chance of surviving at generation 90 since GenFin has now become quite greedy. Selection and reproduction are described in Algorithm 1. g represents an RV with a standard normal distribution produced by the Box-Muller method. Lines L1-L3 initialize the variables and prepare the solution array. range. After mapping, the distribution of index_selected may not be strictly normal, but at least resemble the normal distribution. Lines L8 and L9 replace solutions starting from the lowest-rank one, with the solution just selected. The procedure repeats until enough children are produced.
Algorithm 1 Selection and Reproduction Algorithm Pseudocode
After reproduction, two parents are randomly picked to perform crossover. Then, solutions are randomly mutated. The mutation rate and number of crossovers are shown in Table II , which also shows the values for other GA configuration parameters used in our simulation settings. The number of new solutions is the number of the low-rank solutions replaced by newly born children. After crossover and mutation are performed, one evolution cycle finishes. 
B. Incremental FinPrin
In order to speed up GenFin, we use an incremental statistical timing/power analyzer called incremental FinPrin, which is built atop the statistical circuit analyzer FinPrin [18] . Its flow is shown in Fig. 8 . It needs three input files to compute the timing, dynamic power, and leakage power distributions: 1) circuit netlist; 2) gate sizing file; and 3) wire information obtained using a tool called Capo (which is a routability driven placer tool from the University of Michigan [37] ). The motivation for using incremental statistical analysis within GenFin, which requires analysis of each solution in a large population evolved over many generations, is the need for efficiency. For example, the s38417 benchmark in the ISCAS'89 benchmark suite takes 75 generations to finish. The use of the original statistical FinFET circuit analyzer (FinPrin) would lead to GenFin requiring more than four CPU days to finish.
In order to improve the analysis speed to answer statistical timing and power queries after gate sizing changes are made to the circuit, we use the intuitive idea that instead of reanalyzing the whole circuit, we can reuse results from the previous analysis and only reanalyze logic gates affected by the changes. Incremental FinPrin contains three incremental statistical analyzers: 1) timing; 2) leakage power; and 3) dynamic power. Changing the size of a logic gate has an impact on the leakage power of just that gate. Hence, all that is needed is to statistically subtract the old leakage power of the gate from the circuit leakage and add the new leakage power of the gate, and thus obtaining a new leakage distribution for the circuit. For dynamic power (8) , changing the size of a logic gate has an impact on the dynamic power of both this logic gate and the gates that feed it. Correspondingly, we need to update their dynamic power in a similar fashion as leakage power. The method for statistical addition of normal and lognormal distributions was discussed earlier in Section II-B.
Incremental SSTA is more complicated because the impact of a gate size change propagates to other gates in the fan-out area. Hence, a simple statistical subtraction or addition is not adequate. We employ a method based on ATP and gate criticality that is based on modification of a method presented in [12] . This exploits the fact that often propagation of delay distributions through the circuit can be terminated early. This can be illustrated through the simple circuit shown in Fig. 9 . Assume that the size of gate A is changed and the size of gate B is not. Gate A's delay change propagates to gate B through input2. However, if the ATP of input1 is sufficiently close to 1 both before and after the size change, we can terminate further delay distribution propagation and skip SSTA of the gates in the fan-out cone of gate B until another gate size change is encountered.
The experimental results presented in Section IV-A establish the efficacy of incremental FinPrin.
C. Search Space Reduction
As mentioned in [36] , for complex problems, it is advantageous to employ heuristics to seed the GA with some initial feasible solutions. We propose two kinds of heuristics to help reduce search time and guide the evolution direction.
The first heuristic creates initial solutions using deterministic timing analysis (DTA). We start with a circuit configuration in which all gates are minimum-sized (note that this is a nondominated solution since it has the best leakage power). Then, we find the critical path and size up gates to the next higher level (as mentioned later, the sizes that we use are X1, X2, X4, X8, and X16). If it already has the maximum size available in our library (X16), we keep it untouched. Next, we find the critical path in the new configuration and repeat this process on the gates on the critical path. We collect all the configurations and place them in the initial solution pool. We repeat this process until enough initial solutions are produced or all gates on the critical path have maximum sizes. If not enough initial solutions are produced, we generate the rest by assigning sizes to the gates randomly.
The second heuristic exploits gate criticality. The smaller the gate criticality, the smaller the impact of gate size change on the critical path delay. For example, a gate with 0 criticality has no impact on the critical path delay. Hence, increasing the size of 0-criticality gates does not improve delay, but increases the power consumption. Thus, these changes should be avoided.
The percentage of gates that are 0-critical in some of the larger ISCAS'89 benchmarks is shown in Table III . We see that this percentage is quite high. Hence, we can save a lot of computation time if we keep the 0-critical gates at minimum size during initialization and untouched during evolution. Section IV-A presents the experimental results that demonstrate the effectiveness of the two heuristics.
IV. EXPERIMENTAL RESULTS
In this section, we present the experimental results for GenFin when run on ISCAS'89 and ISPD'13 [38] benchmarks.
The FinFET logic library of GenFin is based on the one given in [18] . It has three types of logic gates: 1) inverter; 2) two-input NAND; and 3) two-input NOR. There are five size choices for each gate: X1, X2, X4, X8, and X16. The benchmark netlists are modified to make use of only the above three types of logic gates. All the results are obtained on a RedHat server running at 2.67 GHz, with 4-GB memory per core and OpenMP acceleration. Six cores are used to run the tool. The runtime is reported in seconds.
GenFin requires the four kinds of input files: 1) logic netlist; 2) gate size specification; 3) circuit layout information obtained using Capo; and 4) GA configuration file that includes parameter values shown in Table II . GenFin outputs the size of gates in the circuit, which are stored in the chromosome arrays in PoolB, as well as its timing, dynamic power, and leakage power distributions.
A. Error and Speedup Analysis of Incremental FinPrin
Next, we present results to evaluate the accuracy and efficiency of incremental FinPrin. The percentage of gates that undergo a size change in a given call to incremental FinPrin has an important bearing on its speed and accuracy. For example, 6480 calls are made to incremental FinPrin when GenFin is run on the s13207 benchmark. Out of these, 89% (5769 cases) have <1% of the gates changed and 97.9% (6341 cases) have <10% of the gates changed. Similar results were also obtained for other benchmarks. Hence, we give results for the 0%-10% gate change ratio. Fig. 10 shows the percentage errors in μ and σ for timing, dynamic power, and leakage power for s13207 for incremental FinPrin relative to FinPrin, where the x-axis represents the gate change ratio. Even when the gate change ratio is the same, the actual changes may be made in different sets of gates in the circuit. Thus, the errors have a range. The circle denotes the mean error and the crosses denote maximum and minimum errors, over 10 samples collected for each gate change ratio. Incremental FinPrin can be seen to be very accurate, with the maximum error in μ and σ being only 1.0e−3% and 0.17%, respectively. Errors in μ are generally smaller than those in σ because statistical subtractions and additions needed for computing μ operate in the same way as deterministic operations.
Incremental FinPrin also speeds up the statistical analysis. Fig. 11 shows the timing, power (which includes both dynamic and leakage power), and total (which takes both timing and power into account) speedups obtained by incremental FinPrin. We again show the maximum, minimum, and average speedups for each gate change ratio. Power computation speedup reaches 490×, whereas timing and total speedups only reach 4.2× and 6.6×, respectively. Timing speedup is smaller than power speedup due to the fact that incremental statistical timing analysis is much more complex than incremental statistical power analysis. The decrease in speedup with increasing gate change ratio is to be expected since more gates need to be reevaluated. There are some rare cases in which incremental SSTA takes more time than SSTA (i.e., speedup is <1) for gate change ratio above 0.04. The reason is that when more gate size changes occur, delay change propagation affects a larger part of the circuit. In some cases, the incremental timing analyzer may need to reevaluate almost the whole circuit. With the extra runtime induced by incremental timing analysis, it is possible to end up needing more CPU time in these cases. However, for all gate change ratios, the average speedup for timing analysis is always above 1×.
The μ and σ errors for the ISCAS'89 benchmarks are presented in Tables IV and V, respectively. The mean error in these tables are obtained by averaging over all gate change ratios. The maximum μ error among all benchmarks is only 3.1e−2% and the maximum σ error only 0.74%. smaller benchmarks (s5378 to s1196), the maximum and mean speedups in power are denoted as not applicable (N/A). This is because the runtime used by incremental power analyzer is so small that the timing counter is unable to detect it. The best timing speedup of 16.8× occurs for s1238 and the best power speedup of 2235.7× for s38584. The best total speedup is 16.9×. The cost of reevaluating power for all gates in the circuit in the nonincremental analysis case grows quadratically with circuit size because the latter half of (13) has O(n 2 ) complexity. However, the cost of reevaluating the incremental changes is linear in circuit size. Thus, power speedup tends to increase dramatically with circuit size. This also typically leads to an increase in total speedup with circuit size.
Next, we evaluate the impact of using three methods for initial pool selection based on random selection, DTA (the first heuristic mentioned earlier in Section III-C), and the proposed one based on both DTA and gate criticality (a combination of both the heuristics mentioned earlier). Results are shown in Fig. 12 . The y-axis denotes average distance of all solutions in PoolA at various evolution stages [distance is defined in (25) ]. A smaller distance indicates a better yield. The x-axis denotes the generation number in the evolution process. We see that the random method performs the worst. The DTA method has a significantly improved performance over the random method. The proposed method is the best among the three, outperforming DTA for even timing analysis. This is because gate criticality indirectly determines how sensitive total delay is to a gate change. Dropping 0-critical gates from further consideration, in general, helps in this regard.
Note that timing distance increases after generation 60 (G60). This is because the ability of GenFin to simultaneously push timing, dynamic power, and leakage power yields approaches a limit. Timing yield is then sacrificed to improve dynamic power and leakage power yields. However, the best nondominated solutions obtained in the evolution process are not lost. They are stored in PoolB. Table VII shows runtime per evolution for the three methods. Taking the random method as the base case, the proposed method has a 1.3× speedup, which is slightly better '13 BENCHMARKS than the speedup for DTA. GenFin also makes multithreaded acceleration possible using OpenMP. The speedup varies with the number of threads (it was 3.3× with six threads).
B. Multiobjective Optimization
In this part, we discuss the set of nondominated solutions collected in PoolB. In order to compare two sets of nondominated solutions, we first take their union and obtain a new set of nondominated solutions (note that some solutions in one set may be dominated by a solution in another set). Then, we define the concept of contribution to evaluate the two sets. It indicates the number of solutions a given set contributes to the final merged set of nondominated solutions. Fig. 13 displays the contributions from yield-driven and μ-driven GA. The μ-driven GA aims at finding the best solutions with the smallest deterministic timing, leakage power, and dynamic power; whereas the yield-driven GA aims at finding the best yield solutions. The contribution is obtained by ranking solutions using their yield triplets. It is not surprising to see that the yield-driven optimization (which takes both μ and σ into account) has a better performance in optimizing circuit yield than just the μ-driven optimization (which only cares about deterministic values during evolution, ignoring σ ), beyond the first generation (G1).
Since GenFin produces a nondominated solution set instead of just a single solution, designers have many design points to choose from, and thus make wise tradeoffs. Fig. 14 shows the nondominated solutions produced by GenFin for s13207 (recall that more negative the distance, better the corresponding yield) in both the distance [ Fig. 14(a) ] and yield [ Fig. 14(b) ] domains. In reality, the nondominated solutions should be presented in a 3-D space because there are three distances involved. However, since 3-D points will be difficult to discern, we have projected the 3-D values on a 2-D plane. That is why solution A is not dominated by solution B because A is better in dynamic power (dynamic power's distance is denoted by the third element of the triplet). In the distance domain, (−∞, −∞, −∞) is the ideal location. In the yield domain, where the mapping is done through yield = (−distance), the ideal location is (100%, 100%, 100%) . Fig. 14(b) shows the benefits of obtaining the nondominated solution set. Consider the two solutions C and D. Designers would probably prefer D to C because D offers an appreciable jump in timing and dynamic power yields with a very small penalty in leakage power yield. However, if designers used a weighted sum approach to optimization, they might end up with solution C.
Another observation that can be made from Fig. 14 is that solutions are scattered more evenly in the distance domain than in the yield domain. In the yield domain, many of the solutions are squeezed into the bottom right corner. This is because a solution with a distance less than −3 is mapped to a location where yield is >99.7%. This demonstrates why targeting distance, instead of yield directly, in GenFin was a better approach. Tables VIII and IX show GenFin results for the ISCAS'89 and ISPD'13 benchmarks [38] , respectively. The second column shows the number of nondominated solutions and the third column the number of generations. Columns 4-6 present the 99th percentile values for timing, dynamic power, and leakage power, respectively. The 99th percentile indicates the value under which 99% percent of the circuit values are found. The 99th percentile values denote the average across all nondominated solutions. The last column shows the CPU time. Dynamic and leakage power tend to increase with benchmark size. Larger benchmarks have a larger search space. Thus, the possibility that GA finds a nondominated solution during evolution decreases. Hence, more CPU time is required and fewer nondominated solutions are found by GA. We can see that ISCAS'89 benchmarks generally have more nondominated solutions than ISPD'13 benchmarks, because ISPD'13 benchmarks are an order-of-magnitude larger.
V. CONCLUSION
In this paper, we presented GenFin, a framework for statistical multiobjective optimization of logic circuits using GA. With the help of Pareto ranking and adaptive selection, GenFin is able to produce a nondominated solution set that can enable circuit designers to make wise tradeoff decisions. GenFin uses a fast FinFET circuit statistical analyzer to speed up the optimization. It only reevaluates gates whose sizes changed or the neighborhood of such gates, thereby speeding up statistical analysis while maintaining excellent accuracy. Several heuristics were proposed to reduce the search space. Results presented for ISCAS'89 and ISPD'13 benchmarks demonstrate GenFin's efficacy.
