Abstract-We present an efficient optimization scheme for gate sizing in the presence of process variations. Our method is a worst case design scheme; however, it reduces the pessimism involved in traditional worst case methods by incorporating the effect of spatial correlations in the optimization procedure. The pessimism reduction is achieved by employing a bounded model for the parameter variations in the form of an uncertainty ellipsoid, which captures the spatial correlation information between the physical parameters. The use of the uncertainty ellipsoid, along with the assumption that the random variables corresponding to the varying parameters follow a multivariate Gaussian distribution, enables us to size the circuits for a specified timing yield. Using a posynomial delay model, the delay constraints are modified to incorporate uncertainty in the transistor widths and effective channel lengths due to the process variations. The resulting optimization problem is relaxed to a geometric program and is efficiently solved using convex optimization tools. The effectiveness of our robust gate sizing scheme is demonstrated by applying the optimization on the ISCAS'85 benchmark circuits and testing the optimized circuits by performing Monte Carlo simulations to model the process variations. Experimental results show that the timing yield of the robustly optimized circuits improves manifold over the traditional deterministically sized circuits. For the same transistor area, the circuits sized by our robust optimization approach have, on average, 12% fewer timing violations as compared to the gate sizing solutions that are obtained via the traditional deterministically based guard-banding method.
I. INTRODUCTION TO ROBUST GATE SIZING
T HE LIMITATIONS of the manufacturing process in the current technologies lead to random variations in various circuit parameters such as the transistor width, the channel length, and the oxide thickness, which may cause a large spread in the circuit performance measures such as delay and power. Since it is impossible to control process-driven variations, it is essential for the design tools to account for these uncertainties to enable the design of robust circuits that are as insensitive to the device parameter variations as possible.
The optimization of gate sizes offers a degree of flexibility in addressing this issue. The gate sizing problem determines an optimal set of transistor sizes, defined as the ratio of the transistor width w to the effective channel length L e , that minimize the area or power consumption of a combinational circuit, subject to meeting the specified delay constraints. Conventional gate sizing tools employ a static timing analysis (STA) routine to generate the delay constraints by adding intermediate variables at the output of each gate in the circuit, and then solve the resulting optimization problem to determine the widths of the devices in the circuit. The minimum length is chosen for all the devices. However, due to the fact that the nominal designs are perturbed by the random process variations, a large number of chips may fail to meet the original delay specifications. This leads to a reduction in the timing yield of the circuit, defined as the fraction of total chips whose delay does not exceed the original specified value. An obvious way to increase the timing yield of the circuit is to design for the worst case scenario, e.g., choose a delay specification of the circuit that is much tighter than the required delay. Unless this new specification is appropriately selected, this could lead to large overheads in terms of the circuit area and the power, as the optimizer may have to aggressively size the critical as well as noncritical paths. Hence, it is necessary to develop smart worst case methodologies in the presence of process uncertainties that keep the area and power budgets within reasonable bounds.
In this paper, we present a novel worst case scheme based on robust optimization theory. In our method, we modify the delay constraints to incorporate uncertainty in the parameters due to the process variations. An uncertainty ellipsoid method is used to model the random parameter variations, assuming a normal distribution of parameters. The spatial correlations of intradie parameter variations are incorporated in the optimization procedure. We impose no restriction on the sign of the correlation factor, i.e., the parameters may be positively or negatively correlated. The resulting optimization problem is relaxed to a geometric program (GP) and is efficiently solved using convex optimization tools. By using the well-known chi-square probability distribution function, the desired timing yield can be parameterized into the optimization formulation. Our formulation is based on the principle of adding uncertainty-related parameter-correlation-aware margins to delay constraints at the output pin of each logic gate. However, using these guard bands for the delay constraints at the output of each node in the circuit graph, 1 instead of the whole path delay, leads to a problem of overestimation of the effect of variations. We reduce this problem by employing a graph pruning technique to reduce the number of intermediate nodes in the circuit graph and the corresponding arrival time variables in the optimization formulation. The use of a variable-sized uncertainty ellipsoid at different topological levels of the circuit graph helps in further removing the extra timing margins in the constraints.
II. PREVIOUS WORK
Traditional gate sizing methodologies [1] , [2] solve the deterministic optimization problem of gate sizing without accounting for variations in parameters. These methods use posynomial delay constraints and formulate the problem as a GP. Section III-B reviews the formulation used in these conventional gate sizing works. Although the method of [1] performs sizing based on sensitivity-based heuristic, the study in [2] offers an exact optimization algorithm to perform gate sizing based on convex programming techniques. There have been several recent attempts to perform uncertainty-aware gate sizing to reduce the timing violations or increase the timing yield. In [3] , the gate sizing problem is formulated as a nonlinear optimization problem with a penalty function added to improve the distribution of timing slacks. One of the first works on statistical gate sizing [4] proposes the formulation of statistical objective and timing constraints, and solves the resulting nonlinear optimization formulation. In other works on robust gate sizing [5] - [8] , the central idea is to capture the delay distributions by performing a statistical STA (SSTA), as opposed to the traditional STA, and then use either a general nonlinear programming technique or statistical sensitivitybased heuristic procedures to size the gates. In [9] , the mean and variances of the node delays in the circuit graph are minimized in the selected paths, subject to constraints on delay and area penalty.
Some of the aforementioned variation-aware gate sizing works are heuristics [6] - [8] without provable optimality properties. The sensitivity-based approaches optimize the statistical cost function in a local neighborhood and cannot guarantee convergence to the globally optimal solution. Others rely on nonlinear nonconvex optimization techniques [4] , [5] , [9] , which either are not scalable to practical circuits or may get stuck in locally optimal solutions. Some of these works [4] , [5] , [10] , [11] ignore important statistical properties of varying parameters such as the spatial correlations.
In [12] , the authors present an interesting approach to optimize the statistical power of the circuit, subject to timing yield constraints under a convex formulation of the problem as a second-order conic program. However, the formulation suffers from the same problem of overestimation of statistical nodal delay constraints as [13] , which will be explained in Section IV-C, and we partially correct this by the techniques described in Sections IV-D and E. More importantly, the solution in [12] relies on a local search over the gate configuration space to identify a size that will absorb the slack assigned by the optimization solution. Such a method based on local searches has to assume that the delay of the gate depends only on the fixed local choices, e.g., a particular size and the fan-out load of a gate. In reality, the gate delay is also a function of the slope of the signals at the input pins of the gate, which, in turn, are functions of the sizes of the fan-in gates and the interconnect delay. Hence, although the local search method of [12] works well for simple delay models as functions of output load only, it is unlikely to work for a realistic delay model also considering input slews.
Recently, a novel method for optimizing the binning yield of a chip has been proposed in [14] . This method provides a binning yield loss function that has a linear penalty for the delay of the circuit exceeding the target delay and proves the convexity of this formulation. However, the method has to rely on an SSTA engine to evaluate the gradient of the binning yield loss function for optimization purposes. This could potentially make the overall procedure considerably slow for many iterations of the optimization loop. As the objective function in the optimization formulation in this paper is nondifferentiable, the procedure could also run into some serious numerical problems while generating the subgradients of the objective function.
In this paper, we propose a novel gate sizing technique based on robust optimization theory [15] . For simplicity, our implementation uses the Elmore delay-based model; however, our approach is applicable to any posynomial delay model such as the rich class of generalized posynomial delay models proposed in [16] . In our method, we first generate posynomial constraints by performing an STA. We then add robust constraints to the original constraint set by modeling the intrachip random process parameter variations as Gaussian variables, contained in a constant probability density uncertainty ellipsoid [17] , centered at the nominal values. The method of [18] also uses the ellipsoid uncertainty model, but for the optimization of small-size analog circuits. We use the well-known chisquare distribution tables to assign a timing yield value in our optimization constraints. Under the ellipsoid uncertainty model, the resulting optimization formulation is relaxed to be a GP and is efficiently solved using the convex optimization tools. Furthermore, using a GP to perform robust gate sizing ensures that the optimizer finds a global minimum, which is not guaranteed in the case of a general nonlinear program. The relaxation of the robust counterpart of the conventional deterministic GP-based gate sizing solution as another GP is a major contribution of this paper; in general, it is not true that the robust versions of convex programs are also convex programs [15] .
Our robust gate sizing scheme is a type of a worst case design method; however, by incorporating spatial correlations in the design procedure, we reduce some pessimism in the design. Spatial intradie correlations between the parameter variations are incorporated in the optimization scheme by using grid-based spatial correlation models used in [19] and [20] . In addition, we show that the nodal constraint formulation adds pessimism, and we reduce some of this pessimism by employing the graph pruning technique of Visweswariah and Conn [21] . Heuristic methods for assigning smaller timing margins at lower topological levels of the circuit graph and increasing guard banding at higher levels by employing differentsized uncertainty ellipsoids also help in reducing the effects of this pessimism.
We focus on the intradie variations in L e and w parameters; however, the method can be easily modified to include interdie variations. Process-driven variations in the interconnect widths and thickness can be also included in our method. The following sections describe in detail the various steps of our robust gate sizing method.
III. PRELIMINARIES
In this section, we will review some of the basic tools and formulations that we build on to obtain our robust optimization formulation.
A. Geometric Programming
A function is called a monomial function if it can be written in the form
where x ∈ R n ++ , c > 0, and a i ∈ R. The variables in a monomial function and the coefficient c are strictly positive, and the exponent a i can be any real number.
A sum of monomials is called a posynomial function. It can be written as
where c k > 0.
From (1) and (2), a GP can be defined as an optimization problem of the form
where f 0 , . . . , f m are posynomial functions as in (2) , and h 1 , . . . , h m are monomial functions as in (1). GPs are not, in general, convex optimization problems. However, by a simple transformation of variables, x i = e y i , in the objective and constraint functions of (3), they can be converted to a convex program [15] and, hence, can be efficiently and globally solved using the convex optimization methods.
B. Deterministic Gate Sizing as a GP
The conventional deterministic gate sizing problem is formulated as
where x i 0 represents the nominal size of the gate, a i is some weighting factor such as the number of transistors in a gate cell, t j is the intermediate input arrival time variable at the fan-in of gate i, d ji is the delay of gate i from the jth input pin to the output pin as a function of the vector X 0 of the nominal gate sizes, T spec is the specified target delay, and x min and x max are the lower and upper bounds on the gate sizes, respectively.
Using the Elmore delay model, 2 each gate i in the circuit can be replaced by an equivalent R on i C i element, where R on i represents the effective on-resistance of the pull-up or pulldown network, and the term C i subsumes the source, drain, and gate capacitance of the transistors in the gate. The expressions for R on i and C i for a gate i are given by
where the constants c 1 , c 2 , and c 3 can be derived from [2] . The capacitance and the on-resistance of the transistors in a gate are posynomial functions of the gate size, characterized by the width w of the transistors in the gate. Consequently, the term R on i C i , which is the equivalent delay contribution of gate i in the circuit, is also a posynomial function of w.
From (4) and (5), the delay constraints at each node of the circuit graph can be written as
where K l is a constant coefficient of the lth monomial term in the posynomial delay expression and can be derived from (5), x k represents the width of gate k, and a k is the exponent of the kth component of the X 0 vector, ∈ {−1, 0, 1}. By substituting (6) in (4) for all gates in the circuit, the conventional transistor sizing is formulated as a GP optimization problem of (3), having a posynomial objective function and posynomial constraints, which can be solved using the convex optimization techniques. In Section IV, we show how the robust version of the standard GP formulation, for the deterministic case, can be converted to another GP.
C. Ellipsoidal Uncertainty Set
For any vectors Ω and Ω 0 ∈ R n , and a nonsingular matrix P ∈ R n×n , an ellipsoid set U is defined as [17] 
If P is a symmetric and positive-definite matrix, an alternative representation of (7) is realized by substituting
where u 2 = u T u is the two-norm of vector u. For a symmetric and positive-definite matrix P , the matrix P 1/2 can be computed by the eigen decomposition of P . The ellipsoid represents an n-dimensional region, where the vector Ω varies around the center point Ω 0 . The vector u characterizes the movement of Ω around Ω 0 . Fig. 1 illustrates the ellipsoid in R 2 . The half-lengths of the axis of the ellipsoid are a factor ψ of the square roots of the eigenvalues λ 1 and λ 2 of the matrix P , and the direction of the axis is given by the eigenvectors of P -e 1 and e 2 .
Considering the vector Ω to consist of random variables corresponding to the parameters of variations, with an associated covariance matrix given by P , and assuming that the parameters of variation follow a Gaussian distribution, the ellipsoid set described in (7) and (8) can be used as a bounded model of variations. In particular, it can be shown that the constant probability density contours of a multivariate normal distribution represent an ellipsoid set. The joint probability distribution function (pdf) of the multivariate normal random vector Ω with a covariance matrix P is
where |P | is the determinant of the covariance matrix P , and n is the number of components in the variation vector Ω. It is clear from (9) that the pdf of a multivariate normal distribution would be a constant c if (
This relation precisely represents the surface of an ellipsoid given by (7) with c = ψ 2 . Since the covariance matrix P is symmetric and positive definite [17] , we can also equivalently represent the constant probability ellipsoid as (8) . Thus, from the discussion above, by assuming normality of the parameter distribution, the ellipsoid set can be regarded as a high-dimensional region, inside which the parameters randomly vary. This bounded model of parameter variations in the form of an ellipsoid set is referred to as an uncertainty ellipsoid. In Section IV, we use this uncertainty ellipsoid model to simplify our robust constraints and formulate the robust GP optimization problem.
D. Chi-square Distribution
If r i are n independent normally distributed random variables with means µ i and variances σ
2 is distributed according to the chi-square distribution χ 2 n with n degrees of freedom [17] . The chi-square distribution is a special case of the gamma distribution, and for a random variable z following the chi-square distribution, the cumulative density function (cdf) of z is given by [22] 
where Γ is the gamma function, and γ is the incomplete gamma function [22] . Referring back to (7), it can be proved that the random [17] . Therefore, the solid ellipsoid given by (7) can be assigned a prespecified amount of probability α as
where F is the chi-square cdf given by (10) .
As will be explained in Section IV, we use the uncertainty ellipsoid to pad the deterministic delay constraints, and with the prespecified probability α given by the lower bound on timing yield specification, we define the size of the ellipsoid. This determines the amount of margin that is required for each delay constraint.
IV. VARIATION-AWARE GATE SIZING

A. Effect of Variations on Constraints
The deterministic posynomial constraints of (6) can be represented as
where
represents the jth constraint function, and X 0 is the vector representing the nominal gate sizes x 0 i for all gates. The conventional GP optimization assigns a set of optimal x 0 to the vector X 0 , so that each delay constraint is satisfied, i.e., t j + f i (X 0 ) ≤ t i for all constraints i, and the area objective is minimized.
However, due to the effect of process variations, the posynomial delay models of the gate can no longer be assumed to be deterministic quantities. Thus, the constraint inequalities at each node should be rewritten as
where Ω is the random vector of perturbations around the nominal values of the parameters. For the cases when the new value of the constraint function t j + f ji (X 0 , Ω) > t i , the effect of the random process variations leads to the original constraints being violated and a possible timing failure for the circuit.
Assuming that the random parameter perturbations around the nominal values are small, the new value of the gate delay model f i (X 0 , Ω) can be approximated by a first-order Taylor series expansion as
where ∇ Ω 0 represents the gradient calculated at the nominal values of the parameters, and δΩ represents the zero-mean random variation in the parameters such as the transistor width, the effective channel length, and the oxide thickness around the nominal values. Note that the coefficient K l also depends on the parameters and, therefore, should be regarded as a function K l (Ω) of the perturbation vector. In (14) , the term
δΩ is the variational term representing the effect of process variations added to the nominal term l K l j x a j j 0 . To safeguard against the uncertainty of process variations, it is necessary to meet the constraint t j + f i (X 0 , Ω) < t i for the maximum value of the variational term. In other words
Next, we show that by employing the concept of an uncertainty ellipsoid U , the constraint of (15) can be transformed to a set of posynomial constraints, so that the robust optimization formulation remains a GP, and can be efficiently solved. Our robust GP formulation is applicable for all cases where the original constraints are in the form of a generalized posynomial [16] .
We use the uncertainty ellipsoid to model the process variations that randomly perturb the transistor parameters around the nominal values for which they were designed. As the random vector Ω of uncertain parameters varies around the nominal parameter vector Ω 0 , the variations are considered to be bounded within the ellipsoid regions defined by (8) . In other words, referring to (8), the variation δΩ from Ω 0 is given by δΩ = P 1/2 u with u 2 ≤ ψ. Alternatively, we could have chosen the variation δΩ in the parameters to be bounded in an n-dimensional box given by Ω min ≤ δΩ ≤ Ω max . However, using the box as a model for bounded variation ignores any correlation information between the random components of Ω, as each component can independently move inside a box, assuming any values between the minimum and maximum range. Thus, optimizing for a maximum variation in such a box region would translate to an overly pessimistic design. Moreover, an n-dimensional box modeling of parameter variations would be accurate only in the highly unlikely case when all parameters are statistically independent with respect to each other and follow a uniform distribution. Most parameters have been observed to follow a distribution that resembles a Gaussian one. The advantage of using the ellipsoid uncertainty model is that not only it accurately models the region of variation for normally distributed parameters, but also any correlations between the parameters are directly captured by appropriately constructing the elements of the covariance matrix P . The covariance matrix can be derived from spatial correlation models such as the ones used in [19] and [20] .
In the next section, we show with the aid of a small example the use of the uncertainty ellipsoid model in converting the constraint of (15) to a set of posynomial constraints and formulating the robust GP for gate sizing in the presence of process variations. 
B. Robust GP Formulation
We use a simple example to explain the procedure to incorporate the process variation effects in the delay constraint set. We use the toy circuit of Fig. 2 , comprising of just one driver gate and one load gate, for this illustration; however, the idea can be generalized to arbitrarily large circuits. In this example, we consider the widths (w 1 , w 2 ) and the effective channel lengths (L e 1 , L e 2 ) of the two gates as the only varying parameters. The scheme can be directly extended to include other parameters.
Applying the Elmore delay model to the gates of circuit of Fig. 2 , and for simplicity, neglecting the interconnect delay and the effect of drain and source capacitance of the driver gate, the delay constraint for the circuit can be written as
where K 1 and K 2 are constants. As explained in Section IV-A, to ensure that the delay constraint of (16) is met under the effect of random process variations, the first-order Taylor series expansion of the constraint function results in the following relation:
where w 0 and L e 0 represent the nominal values of the transistor w and L e , respectively, and δw and δL e are the random variations in w and L e , respectively. Employing the ellipsoid uncertainty model of (8) 
where P is the covariance matrix of the random vector Ω consisting of the variations in gate w and L e of the driver and the load gate of Fig. 2 , and u is the vector bounding the variation within the 4-D ellipsoid centered at the nominal values of w and L e , with u 2 ≤ ψ.
We introduce a vector φ to collect the coefficients of the variational parameters of (17) as
From the definitions in (18) and (19), (17) can be rewritten as
where a, b represents the inner product of vectors a and b.
Since the covariance matrix P is symmetric and positive definite [17] 
where Q is the matrix containing the eigenvectors of P , and λ i , . . . , λ n are the n eigenvalues of P . Next,
T , the positive and negative terms of the elements of vector Mφ can be separated as
where η 1 and η 2 contain all the positive and negative terms, respectively, of the elements of the vector 3 Mφ. From the well-known result of the Cauchy-Schwartz inequality
and from (21) and (22), along with the fact that in the ellipsoid uncertainty model, u 2 ≤ ψ, a sufficient condition 5 for (20) is
We then introduce the following two additional robust variables r 1 and r 2 :
The inequality of (24) is then replaced by the following relaxed constraints:
As the optimizer tries to minimize the value of the robust variables r 1 and r 2 , the relaxed inequality constraints of (27) and (28) would enforce the equality constraint of (25) . The inequality of (26) 
are the summation of monomials with positive coefficients. Consequently, the constraints of (27) and (28) are also posynomials. Hence, by following the procedure described in the above equations, we convert the nonrobust posynomial constraint of (16) to a set of robust posynomial constraints of (26)-(28) by introducing two additional variables. It is worth emphasizing that unlike [13] , the robust GP formulation presented in this section does not restrict the elements of the P matrix to be only nonnegative, i.e., the method can handle both positively and negatively correlated parameters.
Next, we address the issue of assigning a timing yield parameter to the optimization formulation. As discussed in Section III-D, we can assign a prespecified probability α to the uncertainty ellipsoid model of variations by using the χ 2 n distribution. From (11), we can determine ψ 2 as the upper 100αth percentile of the χ 2 n distribution from the standard tables of the chi-square cdf. For instance, for the example circuit of Fig. 2 , corresponding to α = 0.9 or 90%, the value of ψ determined from the χ 2 4 cdf tables for the 4-D ellipsoid is ψ = 2.79. The value assigned to ψ determines the size of the uncertainty ellipsoid that is used to pad the nominal terms in the timing constraints. The prespecified probability α serves as the lower bound on the timing yield because the robust constraints formulated using the ellipsoid margin corresponding to such an α would be satisfied for at least α% of all cases. Since there are other points outside the ellipsoid set of the specified probability value that may not cause timing violations, the timing yield could be more than α.
For a general circuit, the procedure described for the example circuit of Fig. 2 is repeated for each constraint. Thus, by addition of at most two additional variables for each constraint, robustness against the process uncertainties is added to the original constraint set while still maintaining the desirable posynomial structure of the constraints. By this procedure, we convert the conventional GP formulation of the gate sizing problem to a robust gate sizing problem, which is also a GP and, hence, can be efficiently solved using the convex optimization machinery.
C. Overestimation of Variations
The optimization formulation described in Section IV adds margins to the deterministic constraints generated by an STA procedure. Due to the fact that separate margins are added at each node of the circuit graph, instead of the whole path, the resulting formulation could result in a large overestimation of the variational component of the circuit delay, which could lead to excessive design penalties.
To understand the problem of this overestimation of the variation, consider a simple example circuit consisting of m chain of inverters as shown in Fig. 3 . For this simple circuit, an STA module would generate the following block-based constraints:
. . .
where d i is the delay of the ith inverter, which is a function of the vector of nominal gate sizes X 0 . By the method explained in Section IV, the equivalent robust constraints for the example circuit of Fig. 3 can be written as
It is easy to see that for the simple circuit of Fig. 3 , the delay is given by the whole path delay as
Thus, the effect of variations can be accounted for by a simple robust constraint of the form
For any m nonnegative functions, y 1 , . . . , y m , the following inequality is well known:
Therefore, for the variation terms in the constraints of (30) and (31), the following inequality holds:
It is clear from (30), (31), and (33) that the approach of adding the variational component of delay at each node leads to extra guard banding. Another way to understand the amount of pessimism introduced in the formulations is by realizing that the robust GP formulation attempts to safeguard against a probability of timing failure that is greater than the actual failure probability, which could lead to extra design margins.
For a simple circuit that is similar to the one in Fig. 3 , it is trivial to trace the path delay and then add margin to the whole path delay constraint. However, in general, the number of paths in a circuit graph can be exponential in the number of nodes. Therefore, enumeration of paths has a prohibitive cost for large circuits consisting of thousands of gates.
To reduce the problem of unnecessary padding at the intermediate nodes in the circuit, without incurring the exponential cost of formulating the path-based constraints, we employ a graph pruning technique proposed in [21] . Section IV-D discusses this pruning method.
D. Graph Pruning
Visweswariah and Conn [21] proposed a technique to reduce the number of variables, constraints, and redundancy in the circuit optimization formulation by removing the internal nodes and the original edges connected to them in the circuit graph. We adapt this graph pruning technique to our method to reduce the pessimism in our gate sizing formulation.
This technique alters the delay constraint formulation by operating on the timing graph of the circuit. An initial timing graph of the circuit is constructed by representing each pin of a gate in the circuit as a vertex, and the connections between an input and an output pin of the same gate, and between an output pin of a gate and an input pin of its fan-out gate, as edges in the graph. The arrival time at a pin of a gate is used to annotate the edge originating at the node corresponding to that pin. Two additional nodes representing the primary inputs (PIs) and the primary outputs (POs) are added to the vertex set of the graph. Fig. 4 shows a simple circuit and its corresponding timing graph.
In the graph pruning method, the nodes of the graph are iteratively screened for a possible elimination by evaluating the cost of this node removal. The cost is typically expressed as some simple function of change in the number of variables and constraints in the optimization formulation after the vertex under consideration is removed from the graph. If the evaluated cost is negative, implying a reduction in the problem size, the node is removed, and subsequently, all incoming and outgoing edges of this node are also pruned from the graph.
1) Example of the Pruning Procedure:
The application of the graph pruning method of [21] to reduce the pessimism in our optimization formulation can be best explained using a simple example circuit and its corresponding timing graph. For this, we refer back to the circuit of Fig. 4 . As shown in the figure, the arrival times at each pin of the logic gates are represented by the variables t 1 , . . . , t 7 . For simplicity, it is assumed that the interconnects have zero delay, and that all PIs arrive at time t = 0. The d ji variables in Fig. 4(a) represent the pin-to-pin delay of a logic gate. Fig. 4(b) shows the corresponding timing graph for the example circuit. By employing an STA procedure, the delay constraints at the output of the pin of each gate in the circuit of Fig. 4(a) can be written as where X 0 is the vector consisting of the sizes of the three gates of Fig. 4(a) , and Ω is the random vector corresponding to the process uncertainties. From the discussion in Section IV-C, adding margins for each of the constraints of (34) can result in excessive guard banding against the effect of variations and, hence, a pessimistic design.
As described in the previous section, the circuit timing graph of Fig. 4(b) and the corresponding constraint formulation of (34) can be altered by selectively removing nodes from the graph. Fig. 5 illustrates the application of the graph pruning technique on the example circuit of Fig. 4 . For this specific example, the pruning cost chosen is simply the difference in the number of variables and constraints after removing a node from the graph. Fig. 5(a) shows the graph obtained after eliminating nodes 1, 2, 3, and 4 in the original graph. Similarly, Fig. 5(b) represents the graph after removing nodes 5 and 6 as well. The final pruned graph, obtained after removing all nodes except the PI and PO nodes, is shown in Fig. 5(d) . For each pruned node, a new edge is added between the fan-in and fan-out nodes of the removed node, and the new edge is annotated with the pruned arrival times. This annotation is required to generate the timing constraints at the end of the pruning procedure.
From the edge annotations, and the original constraints of (34), the constraints corresponding to the final pruned circuit graph of Fig. 5(d) can be written as
In the above set of constraints, the pruning method eliminates all nodes except the ones corresponding to the PIs and POs. Since all intermediate arrival time variables t i are pruned, the above formulation does away with the problem of keeping redundant margins for the constraints at the output pin of each node. It should be emphasized that the example circuit of Fig. 4 is an extremely simple case for which the pruning method can eliminate all intermediate nodes and arrive at the path delay constraints of (35). Therefore, the problem of overestimation of the effect of variation, as described in Section IV-C, is completely resolved for this example circuit. In general, for practical circuits, the graph pruning procedure could determine some nodes that are unsuitable for pruning, and some intermediate nodes could still remain in the final pruned circuit graph. However, due to the removal of many intermediate nodes, the pessimism in the robust optimization formulation is considerably reduced.
2) Practical Issues in Using Graph Pruning for the Robust GP Formulation: By removing a node with m fan-ins and n fan-outs from the circuit graph, the change ∆ con in the number of constraints is ∆ con = 2(mn − (m + n)), and the change ∆ var in the number of variables is ∆ var = −2, as the variables corresponding to both rise and fall delays of the pruned node are eliminated. A pruning criterion can, thus, be established as some function f cost (∆ con , ∆ var ) of the change in the number of variables and constraints. The pruning procedure iteratively operates in which the nodes with the lowest nonpositive f cost are pruned in the first pass. After the first iteration, the number of fan-ins and fan-outs of the unpruned nodes is recalculated due to the addition of new edges in the pruned graph. This iterative method continued until all nodes in the graph produce a positive f cost . At this point, no more nodes can be removed from the graph according to the given pruning metric. Typically, the pruning criterion is chosen as f cost = a∆ con + b∆ var , where a and b are some normalized weighting factors. However, due to some practical problems in applying the graph pruning method to our formulation, we use a slightly modified pruning cost function. The following discussion explains these practical issues.
The number of delay terms corresponding to the posynomial gate delay models increases in every constraint during the pruning procedure. This results in the following problem for our robust GP formulation. Referring back to our robust GP method described in Section IV-B, we modify each delay constraint to include the terms corresponding to the maximum effect of variations inside the bounded uncertainty ellipsoid model. This is achieved by adding to each constraint new robust variables r 1 and r 2 , defined in (25) , and including additional constraints to the formulation, given by (27) and (28), as
2 ≤ 1. For constraints at each node of the circuit graph, the vector φ is typically sparse, as this vector consists of entries corresponding to a few parameters affecting only a single gate delay. As a result, the vectors η 1 and η 2 respectively derived from the positive and negative terms of the elements of P 1/2 φ are also sparse. However, during the graph pruning method, as the intermediate nodes are removed, the number of d ji terms increases in every constraint. Thus, the sparsity of the φ vector and, consequently, the sparsity of η 1 and η 2 are adversely affected. Moreover, as these vectors become dense, the number of monomial terms in the quadratic expansion of the constraints ψ 2 η
2 rapidly grows. As a result, many constraints have monomial terms involving a large number of variables. Consequently, the constraint Jacobian matrix becomes very dense, which can considerably slowdown the gradient computations required by the convex optimization methods such as the interior point algorithm.
To overcome this issue of potential slowdown of the gate sizing procedure, due to the increase in density of the constraint Jacobian matrix, we modify the pruning cost to include a penalty term that is related to increasing the number of terms in the η 1 and η 2 vectors. We define Mono num as the maximum number of monomial terms in all the constraints affected by removing the node under consideration. The cost of pruning this node is then calculated as
where c is a weight factor, and Mono spec is a user-specified quantity to represent the maximum number of monomial terms allowed in each constraint. A higher value of Mono spec could result in more pruning, but at the cost of potential slowdown in obtaining the solution of the GP optimization problem. Thus, by adjusting the Mono spec parameter, the user can choose an engineering tradeoff between the runtime and the amount of pessimism reduction desired in the gate sizing procedure.
In the next section, we elaborate on another heuristic method to further reduce the pessimism in our formulation.
E. Using Variable-Sized Ellipsoids
The graph pruning procedure of [21] , explained in Section IV-D, helps in eliminating many intermediate arrival time variables and reduces the problem of variation overestimation in our formulation. However, as described in the previous section, it may not be possible to remove all intermediate nodes from the graph and leave only the ones corresponding to the PIs and POs unpruned. The number of fan-ins and fanouts of a node monotonically increases during the pruning procedure. Therefore, for a given pruning cost of (36), if a node is unsuitable for pruning in any iteration of the pruning method, i.e., it has a positive pruning cost, it will never be pruned under the same criterion. Due to the presence of the unpruned nodes in the circuit graph, the pessimism in our optimization formulation is not completely eradicated.
We present another method, to be employed after the graph pruning procedure, to further reduce the excessive margins from the timing constraints formulated at the unpruned nodes of the graph. This method is based on setting variable margins at different topological levels of the circuit. We use a simple example circuit consisting of just two inverters to explain this method.
Consider the circuit of Fig. 6 consisting of two inverter gates. For this simple circuit, the intermediate node, corresponding to the output pin of the first inverter, can be easily removed to formulate the path delay constraint. However, for the purposes of exposition of the method of using variable ellipsoids, we do not employ any pruning and formulate the constraints for this circuit as We use different guard bands for the constraints (37) and (38) by employing two uncertainty ellipsoids (U 1 and U 2 ) given by
where ψ 1 < ψ 2 . As explained in Section III-D, we can use the cdf tables of the χ 2 n distribution to associate probability values α 1 and α 2 with the ellipsoids U 1 and U 2 , respectively. As ψ 1 < ψ 2 , it follows that α 1 < α 2 .
A simple probabilistic analysis to achieve the timing yield of the circuit of Fig. 6 provides insights into the idea of using variable ellipsoids. By using a smaller ellipsoid U 1 to guardband the timing constraint of (37), we associate a smaller probability α 1 as a lower bound on the chance that this small design margin would be sufficient to meet the constraint in the face of variations. However, even if the design margin is not sufficient to meet this constraint, by employing a larger ellipsoid U 2 , and the corresponding bigger probability α 2 , to pad the timing constraint of (38), we have a better chance to compensate for the violation of constraint (37).
Therefore, the scheme of using a smaller design margin for a lower topological level followed by a sufficiently large design margin for higher levels can still provide the necessary guard banding to achieve the desired timing yield.
For a general circuit with k topological levels, we employ k uncertainty ellipsoids U 1 , U 2 , . . . , U k characterized by the constants ψ 1 , ψ 2 , . . . , ψ k with ψ 1 < ψ 2 < · · · < ψ k . Since it is extremely difficult to relate the individual ellipsoid sizes with the timing yield specification, we heuristically chose ψ k to correspond to the lower bound on the specified timing yield α k and progressively decrease the constants ψ k−1 , . . . , ψ 1 . The value of ψ k is determined from the tables of the χ 2 n distribution. The margins at logic levels 1, . . . , k − 1 are determined by setting
where γ is an empirically determined factor. Using smaller timing margins at lower topological levels, as compared to choosing the same margin at all levels, corresponding to the lower bound on timing yield α k , helps in reducing the pessimism in our formulation. It should be noted that this scheme of using variable-sized ellipsoids is employed for the unpruned nodes only after the graph pruning step. The graph pruning method of [21] , followed by the heuristic scheme of keeping variable guard bands at different topological levels of the final pruned circuit, significantly reduces the problem of overestimation of variation in our gate sizing procedure.
F. Incorporating Spatial Correlations
We use the grid-based spatial correlation models of [19] and [20] to incorporate the intradie correlations between the parameter variations that exhibit spatial dependence, such as the transistor w and L e .
The widths (channel lengths) of the devices located in the same grid are assigned a perfect correlation factor, device widths (channel lengths) in nearby grids are assigned a high correlation factor, and the ones in faraway grids have a low or zero correlation factor.
For a random vector Ω representing the variations in w and L e , and its corresponding covariance matrix P , the entry P ij = σ i σ j ρ ij denotes the covariance between components i and j of Ω, where σ is the standard deviation of each random variable, and ρ ij is the correlation factor between the random variables i and j. By employing the spatial correlation model, the correlation factor between all elements of Ω is computed and stamped out in matrix P . The ellipsoid uncertainty model, described in Section III-C, then incorporates the impact of correlations in the robust optimization formulation.
G. Complete Sizing Procedure
The complete gate sizing procedure can be recapitulated by the following steps.
1) Generate the initial nonrobust timing constraints by an STA procedure. 2) On the original circuit graph, employ the graph pruning method of [21] , described in Section IV-D, to remove as many intermediate nodes as possible according to the pruning cost function of (36). 3) For the final pruned graph, generate new timing constraints using the edge annotations in the final pruned graph. 4) Generate a first-order Taylor series expression for each constraint at the nominal values of the parameters. 5) Employing the uncertainty ellipsoid model, transform each constraint to a set of robust constraints, as described in Section IV-B. For this step, use variable-sized ellipsoids at each topological level of the circuit, as explained in Section IV-E. 6) Solve the resulting GP by using convex optimization tools. The solution of the convex optimization problem provides the gate sizes for the circuit that minimize the area objective subject to the specified timing yield constraints.
V. EXPERIMENTAL RESULTS
The proposed robust gate sizing procedure was implemented in C++, and an optimization software [23] was used to solve the final GP. All experiments were performed on P-4 Linux machines with a clock speed of 3.2 GHz and 2 GB of memory. The robust gate sizing technique was applied to the ISCAS'85 benchmark circuits. The cell library selected comprised inverters and two and three input NOT-AND and NOT-OR circuit gates. We assume capacitive loading for the gates. For simplicity, we consider the variations in the transistor width and the effective channel length as the only sources of variation. However, our approach can be easily extended to incorporate other various parameters of variation for the gate and interconnect delays. We use a simple Elmore delay model to generate posynomial gate delay models. Our approach can work just as well for any other posynomial-based delay models such as the ones based on generalized posynomials proposed in [16] .
We use the spatial correlation models of [19] and [20] to generate the elements of the covariance matrix P . To use these spatial correlation models, we first place the circuits using the placement tool Capo [24] and then divide the chip area into different number of grids, depending on the circuit size, so that each grid size is no greater than 50 × 50 µm. The standard deviations of the w and L e parameters are chosen from [25] for a 100-nm technology node. Using this spatial correlation model, all the elements of the covariance matrix P are obtained to be nonnegative, which simplifies the implementation of the robust constraint generation process. However, the formulation, as described in Section IV-B, does not impose any sign restrictions for the elements of the P matrix. The objective function that is chosen for the optimization is to minimize Area = i a i w i 0 , where a i is the number of transistors in gate i. For each circuit, the value of T spec is chosen to be the point of 15% slack, i.e., T spec = D min + 0.15(D max − D min ), where D min and D max are the minimum and maximum possible delays of the circuit, found by setting all gates to the minimum and maximum sizes, respectively.
We implement the graph pruning technique of [21] to address the problem of overestimation of variation. As described in Section IV-D-2, we set the pruning cost of a node as f cost = a∆ con + b∆ var + c max(Mono num − Mono spec , 0). For this cost function, we choose a = 1.5, b = 1, and c = 1. We choose different values for the term Mono spec , which determines the maximum number of monomial terms allowed in each constraint. As described in Section IV-E, we employ smaller sized uncertainty ellipsoids at lower topological levels of the circuit and progressively increase the ellipsoid size at higher logic levels. The size of the largest ellipsoid employed at the highest logic level k, characterized by ψ k , is chosen to correspond to the lower bound on the timing yield specification α k . The value of ψ k is determined from the tables of the χ 2 n distribution. The margins at logic levels 1, . . . , k − 1 are determined by using (41) and choosing the factor γ to be in the interval [0.05, 0.10], which corresponds to a 5%-10% decrement from the value of α k , that specifies the lower bound on the timing yield. The value of each ψ i , corresponding to α i in (41), is determined from the cdf tables of the chi-square distribution.
In the first set of experiments, we compare the gate sizing solution obtained by our method with a deterministic gate sizing solution. The deterministic gate sizing is also formulated as a GP using the formulation of Section IV; however, it does not take into account the effect of parameter variations. For our robust optimization procedure, we set the lower bound on timing yield α k = 85% and choose the value of Mono spec = 35. To simulate the effect of parameter variations, we perform Monte Carlo analysis. We refer to the set of gate sizes obtained from the deterministic and the robust optimization as X 0 det and X 0 rob , respectively. Using these sizes, we generate 10 000 samples each from two multivariate normal distributions N 1 (X 0 det , P ) and N 2 (X 0 rob , P ). Next, we perform an STA for each of these samples and record the number of times the circuit meets the specified target delay. The timing yields of the two optimizations are then determined as Y ld det = n det × 100/M and Y ld rob = n rob × 100/M , where n det is the number of samples drawn from the N 1 (X 0 det , P ) distribution that meet the timing requirements, and n rob is the number of samples drawn from the N 2 (X 0 rob , P ) distribution that meet the specified target delay. The total number of Monte Carlo samples is given by M = 10 000. Table I contains the relevant data for this comparison.
The first column in Table I lists the benchmark circuit, and the number of gates in each circuit is shown in column 2. The timing yield of the deterministically sized circuits Yield det is listed in column 4 of the table. Since the nonrobust gate sizing method does not take into account the effect of variations, the timing yield, as expected, is quite low for these circuits. Our robust sizing method eliminates these timing violations by keeping adequate design margins. Column 7 lists the timing yield Yield rob of the robustly sized circuits. It should be noted that a value of α k = 85%, as a lower bound on the timing yield, is sufficient to provide an actual yield of about 99% for all benchmark circuits. The area overhead that the robust circuits have to employ to safeguard against the parameter variations is shown in the sixth column of Table I . At the cost of an area increase of about 8%-18%, the robustly sized circuits are able to eliminate almost all timing violations. The runtimes of the deterministically and robustly sized circuits are listed in columns 5 and 8 of the table, respectively. As seen in the table, the robust method is much slower than the deterministic sizing procedure. The steps of employing graph pruning and the increased problem size of the robust gate sizing procedure due to the presence of robust variables and constraints lead to this relatively higher runtimes. However, the overall runtimes of the gate sizing method are very reasonable.
We perform another series of experiments to compare our approach with a gate sizing methodology employing a conventional worst case design approach. The worst case designs are obtained by iteratively solving the standard GP, but for delay specifications that are tighter than the original required target delay, until the area of the worst case design is the same as that of the robust design. These circuits are, thus, designed using an in-built guard band, determined by the difference of the original target delay and the tighter delay specification. Furthermore, to explore the area-robustness tradeoff, we vary the size of the largest uncertainty ellipsoid used, by choosing different values of the factor α k , that determines the lower bound on the timing yield of the robustly sized circuits. For these experiments, as before, we set the values of Mono spec = 35 to define the pruning cost function of (36). Having sized these circuits, we perform Monte Carlo simulations to determine the timing yield of the worst case and robust circuits. Table II lists the results of these experiments. As seen from the table, the number of timing violations reduces with an increase in the area for the worst case and robust circuits. However, in all cases, our robust design has a better timing yield than the worst case design having the same area. On an average, the robust design has about 12% greater timing yield than the worst case design having the same area. The better performance of our robust sizing solution is not surprising because of the fact that the spatial correlation information, stored in the P matrix, is used by the optimization scheme. The worst case circuit is expected to have a large overhead since designing by setting tighter delay specifications results in rendering critical some of the earlier noncritical paths. Therefore, the optimizer now has to aggressively size the gates on these paths, which results in a greater transistor area than actually required. Since the runtimes for our robust gate sizing solutions are not prohibitively high, the user can run the optimization for different values of α k to select the amount of robustness required against the process uncertainties at the cost of an additional chip area. Fig. 7 shows a tradeoff between the timing yield obtained and the circuit area utilized by the robust and worst case solutions for C5315 and C6288 circuits.
In the next set of experiments, we investigate the usefulness of the graph pruning method, and employing different-sized ellipsoids, in reducing the pessimism in our robust formulation. We first employ graph pruning and use variable-sized ellipsoids to optimize the benchmark circuits. At the highest topological circuit level, we use the largest ellipsoid corresponding to a value of α k = 0.65. At the lower topological levels, we progressively decrease the ellipsoid size by choosing a lower α, as given by (41). We use a value of Mono spec = 35 to set the pruning cost according to (36). These circuits are referred (36) to as Rob 1 designs. Next, we optimize the benchmark circuits without any pruning and using the same-sized ellipsoids at all nodes, determined by the value of α k = 0.65. These optimized circuits are referred to as Rob 2 designs. Table III contains the results of these experiments. The yields of the two designs, i.e., Y ld rob 1 and Y ld rob 2 , are respectively listed in columns 7 and 10 of the table. The areas employed by the Rob 1 and Rob 2 designs are respectively shown in columns 6 and 9 of the table. As seen from these data in Table III , the designs employing the heuristic techniques of graph pruning and using variable-sized ellipsoids use an about 7%-15% lesser circuit area compared to the design employing no pruning and using a constantly sized ellipsoid. The timing yields of the Rob 2 designs are only slightly better, < 2% for all circuits, compared to the timing yields of the Rob 1 designs. This indicates that employing the graph pruning method, and the strategy of keeping variable guard bands for the timing constraints, leads to considerable pessimism reduction in our optimization formulation without a significant loss in the timing yield of the circuit. The runtimes for the Rob 2 designs are smaller compared to the Rob 1 designs. This is due to the fact that the robust constraints of (27) and (28) have fewer monomial terms for the procedure not employing any pruning compared to the one that prunes some intermediate nodes. As a result, the constraint functions are sparser for the former method, which helps in speeding up the optimization. The absence of the graph pruning step also makes the procedure for Rob 2 design run faster.
In the last set of experiments, we explore the tradeoff obtained by tuning the pruning cost function by changing the value of the Mono spec term, which regulates the maximum number of monomials allowed in a constraint. This term in the pruning cost of (36) helps in preventing the constraint Jacobian matrix from becoming immoderately dense. Table IV contains the results of these experiments. As seen in the table, as the value of the Mono spec term increases, the runtime of the procedure increases. For the larger benchmark circuits, the slowdown of the optimizer is significant, e.g., for the C6288 circuit, the runtime increases by almost 40% by increasing the value of the Mono spec term from 20 to 50. This is due to the fact that for larger circuits, with thousands of constraints, the sparsity of the large constraint matrix has a greater impact on the speed of the convex optimization tool. Although the runtime of the robust optimization method increases, for higher values of the Mono spec term, there is also a greater reduction of pessimism in the formulation due to more aggressive pruning. This results in lesser use of the circuit area for a higher value of the Mono spec term. For example, for the C6288 circuit, there is a 5% reduction in the area by increasing the value of Mono spec from 20 to 50. The timing yield is not significantly impacted by changing the value of the Mono spec term. Based on this runtime and the reduction in the circuit area tradeoff, the user can appropriately set the value of the Mono spec term to be employed in the pruning cost function of (36).
