In high performance systems, fabrication variability reProcess variations result in a considerable spread in the fresults in a considerable spread in the frequency of the chips quency of the fabricated chips. In high performance appli-(about 30% according to [15] 
cations, those chips that fail to meet the nominal frequency violate the timing constraint are simply discarded and in after fabrication are either discarded or sold at a loss which other cases they are sold at a loss. In the latter case, those is typically proportional to the degree of timing violation.
chips that fail to meet the nominal frequency after fabricaThe latter is called binning. In this paper we present a gate tion are binned based on their speed. Some work such as sizing-based algorithm that optimally minimizes the binning [2] [3] or Non-Gaussian [5] , [18] [15] (Tonst -cofs)fT(t)dt sistors. The inequalities of 5 will hence be a posynomial (2) formulation.
In this paper we will minimize the BYL based on the penalty 3. BYL(s) = fT (t(g) -Tc,n.)fT(t(9))dt (3) Smnin < e.' < Smax Vgate i The above formulation will consequently have an exponenwhere s is a vector of the gate sizes in the design. In this tial optimization form, which is convex with respect to xi paper optimization of BYL(s) is done over s.
[16].
4. shwta'nyv Let {tj(1), q*(1) } and {t *(1), q*(2)} be the optimal solutions of (12) the left and right inequalities respectively. Multiplying (1 0)i*(2) q(3) = Oq*(l) + (1 -Q)q*(2) }. By replacing these
Step 1: Initialize definitions in the inequalities of 19 we will obtain: Let e > 0 and xl be a feasible solution satisfying the conLet k --0 and define lo(x) = -oo, uo(9) = oo. 
Step 3: Define the Lower Bound at 5k q3I.0
Evaluate ak and /k such that lk > ak+ < Ik, X >:
This implies that for x = X3 the following set:
Step 4: Update the Optimization Set {It3) = fjit(l) ± (1-)i*(2) q(3) = Qq*(l) + (1 9)q*(2)} iS a Add the following to the existing set of constraints: feasible solution to the inequalities of 14. Therefore the lk > Ik-1 ik >_ ak+ < fk,X > optimal solution at x = X must be smaller than (or equal Update the objective function to Minimize Ik. to) Oq*(l) + (1 -Q)q*(2).
Step 5: Solve the Optimization to get xk and Update the The optimal solution is v (Oxf + (1 -)x) . Therefore, Step 6: Stopping Rule
Stop if uk -Ik < E, otherwise go to Step 1. gives us a new value for xk and the entire process is repeated In fact our approach could be used as a heuristic for optitill the lower bound approximation and the upper bound are mizing YL. But there are some important results that can within a user specified range of tolerance (note that each Xk be proved about the optimality of YL as illustrated below:
SOME GENERALIZATIONS
corresponds to an upper bound BYL(fk)) [16] . Theorem: The optimal BYL is 0 iff the optimal YL is 0.
Next we will explain how the statistical timing analysis Proof: Let us suppose we have a solution for which BYL (STA) can be integrated as a useful tool in our formulation, 0. Referring to equation 12, this can happen only if fv (v) and in the Kelley's algorithm to find lower bound on BYL.
0 for all v greater than (not equal to) 0. This means that
Please note that in case that the optzmization of area and/or the pdf of the timing of the circuit (for the given gate sizes) power is necessary, new constraints can be added to our forlies entirely within the timing constraint. Thus YL = 0. mulation that bound the overall area or power. These can be Now let BYL be more than zero, therefore fv (v) must have expressed as convex constraints which allow the use of Kela positive value for some v greater than 0. Therefore, some ley's algorithm to solve the new optimization formulation. part of the timing pdf must be greater than Tcons. Thus YL cannot be zero.
Integration with STA
This is an important result, since by optimizing BYL we 6.2.1 Computing the BYL can 1)achieve a solution for which YL = 0, 2) or by looking Given a gate-level circuit, statistical timing analysis can at the optimal value of BYL check if a solution with YL = 0 be used to efficiently compute the BYL. In section 4.3 we exists.
explained how BYL can be computed parametrically over all samples cZ in Q and for a particular set of gate sizes 6. SOLVING THE CONVEX FORMULATION using equation 13 . It can also be equivalently obtained using
In the previous sections we proved that our proposed forequation 12 bound on the objective function. As the number of itera-/k, x> in Algorithm 1. As expressed in step 3 of the algotions increase, the linear lower bounds of the previous iterrithm, /3k is found by evaluating the slope of the BYL(x) at ations converge to the accurate objective. At any iteration 7k_1 The coefficient Lkk iS found such that BYL(7k1l) = k, the obJective function represented by the piecewise linear lower bounds is optimized while satisfying the feasibility crictk± K /k, 7k-i >. Therefore in order to find the lower teria of the constraints. This gives us a solution vector Xk.
bound, it is sufficient to show the computation of p3k. [10] gate was computed assuming the value of Leff is fixed at its proposes ways that allows the sensitivity to be more effiworst (,u+ 3a). In this approach we set the optimization objective to be minimization of the arrival time at the primary ciently computed. Once 0k is found, ak and consequently output nodes. We also added a new constraint to impose the lower bound are determined, an upper bound on the maximum area of this approach. In Note that the STA at any of these stages can be done usorder to make comparison with our proposed method, we ing any of the proposed techniques in the literature such as set this maximum area of the worst-case approach to be the f5J or /181, and can assume any distribution for Q and any area of the optimal solution generated by our approach. correlation model for its components. [18] . We assumed a variability in the L,ff of each device timing, the worst-case approach however was able to genwith a Gaussian distribution with a mean equal to the nomerate solutions of good quality comparable to our method.
inal value and a 12% standard deviation from the mean. We
Compared to sensitivity-based approach, we achieved an avdetermined the convex expression for the delay of each gate erage of 72% improvement in the BYL with only a 6% area as a function of its size assuming a 90nm technology (for overhead given the stringent timing constraint. We also which we got the information from [19] Figure 2 shows the optimization of objective over time usthe proposed method using the Kelley's algorithm, we inteing our approach compared to the sensitivity-based method grated STA method of [3] We also compared the traditional Yield-Loss of the soluplemented a sensitivity-based approach as well as a worsttion generated by our approach to a sensitivity-based apcase method. The sensitivity method had a framework as proach in which the most sensitive gate was defined as the in [1] or [4] . In this method initially all the gates are set one with maximum change in Yield-Loss due to the change to their minimum size. The sensitivity-based method is a in its size. Our method also improves the Yield-Loss on greedy iterative approach, in which at each iteration the average by 61%o. most sensitive gate is determined and sized up. The most Finally figure 3 shows the curve generated by our apsensitive gate is the one that results in the maximum change proach between the area and BYL. Each point corresponds in the objective due to a small change in its size. For comto the solution of an iteration of Kelley's algorithm. It can parison of this method with ours we set the objective of the be seen that as the iterations progress, increase in area resensitivity-based approach to be the BYL.
sults in a decrease in BYL. 
