We smdy the relationship between robusmess, predictability and performance of VLSI circuits. It is shown that predictability and performance are conflicting objectives. Performance and robustness aTe statically conflicting objectives but they are statistically nonconflicting. We propose and develop means for changing a standard timing-driven partitioning-based placement algorithm in order to dcsign more predictable and robust circuits without sacrificing much ofpcrformance.
INTRODUCTION
Ideally, we would like a design methodology to offer predictable and robust designs at the best performance. High robusmess means that performance of the design is less influenced by noise factors and remains within acceptable limits, i.e., the design is more tolerant to perturbations ~ such as process variations, temperature and supply voltage changes -and therefore more reliable.
We would also like our design methodology to be predictable. Accurate estimation techniques would allow correct decisions early in the design process, which would result in fast design convergence. Predictability is to be achieved in the face of design uncertainties, which are caused by either incomplete system specification or inherent difficulty of estimating performance metrics during the optimization process.
Several ways of defining predictability or uncertainty have been proposed. Uncertainty was considered as the unpredictable process variations, which can cause delays to change from their nominal values [IO] . Srivastava and Sarrafiadeh define predictability as the quantified form of the accuracy of the cost function estimation [13] . In thc context offloorplanning, Bazargan er. al. describe uncertainly as multiple values for heights and widths of the same module, and the goal is to minimize a linear combination of the expected value and the standard deviation of the area of the floorplan [14] . Wang er. 01. consider uncertainty at the placement level as the incomplete information about modules in the netlist [23] . To make routing more predictable at the placement level. one can use techniques far increasing the flexibility of rectilinear Steiner trees [171. Kahng et a l describe signals as having no uncertainty when they all amve simultaneously, which means the output of the cell has little or no uncertainty [IS] . However, when the uncertainty of the signals varies simultaneous arrival of all signals will actually cause greater average and standard deviation of the output cell distribution [9] . A design process is considered predictable in [I61 when the analytical or statistical predictive tools are accurate and allow providing constraints for the following design steps.
In this paper we analyze the relationship between robusmess, predictability and performance (optimality) and seek means for their control. We apply our methodology to timing-driven partitioningbased placement in order to design predictable and robust circuits. We regard the optimization process under uncertainty as the Pcrmissioir to make digital or hard copics of 811 
81
iterative computation of a number of objective functions, which depend on variables whose values are known within a range of values (i.e., as probability distributions or as intervals within which these variables lie). Predictable design means the ability to accurately compute the objective function (within the chosen modeling framework), and to find means of making current estimations closer to the real final values. We use the standard deviation (st. dev. -fraction of the mean delay value) as the measure of predictability of the overall circuit delay distribution at the primary outputs, as well as at the output of each cell inside the circuit. This means that the smaller the SI. dev., the more predictable is the delay. The slope of the variation of the st. dev. of the overall circuit delay, when gate and wire delays change, characterizes the robusmess of the circuit.
STATISTICAL TIMING ANALYSIS
We use statistical timing analysis (StTA) as a modeling framework for the purpose of characterizing circuits from the predictability and robustness perspectives. 
PREDICTABILITY ANALYSIS
To understand the mechanisms behind the interaction between the statistical addition and maximum operations and the predictability of a circuit we performed some studies. Initially, our focus is on two very simple toy-cases: (I) a chain (e.g., path) of delay elements and (2) a generic gate with a given number of input pins with statistical arrival times. The output of the first case is calculated by successive staristical add operations, while the output of thc second case is computed using the sratisrical m a of the inputs, followed by the statistical addition of the gate delay. In each case, we want to find the variation of the st. Observation 1: We note that for large SI. dev. of gate and wire delays (larger than about 9%), circuits that consist of gates with larger number of inputs show more robusmess and bener predictability ( Fig. 1.b) . On the contrary, when gate st. dev. is small, fewer inputs translates into better robustness.
This observation has an implication for CAD developers: it can be used in a physical synthesis process to minimize circuit delay variations. Gates that are placed in regions with high temperature variations (e.g., close to a floating-point unit that is active only part of the time), and hence larger delay variations, can be mapped to large fanin gates in the library. Note that mapping gates to larger fanin gates could have a negative impact on the area or even delay of the circuit. Hence, the technology mapping process has to be done judiciously. The observation is good news for interconnect optimization: more buffers not only helps in reducing the delay, it also helps reduce the variation at the output. Furthermore, this observation confirms the intuition that the uncertainty adds up: the more uncertainty individual elements in a path have, the more uncertain the output would be. It is important to note that observation 2 should not be generalized to any chain of gates with possible converging paths. As will be shown in Fig. 5 .a, more elements on the chain does not necessarily result in smaller deviation at the output.
Next, we analyze the behavior of ao., when two elements in the chain vary in opposite directions. This situation can happen when the length of a net increases while the length of a different net decreases but the sum of both remains the same (e.g., inverter free to move to left or right in Fig. 2.a) . A similar situation can appear among the delay elements at the inputs of a gate when the decrease in a latest arrival time at one input can he at the expense of the increase of one at another input (e.g., gate free to move to left or right in Fig. 2 .b).
Fig. 2 Study cases
The simulation result of the two cases is shown in Fig. 3 . Plots in Fig. 3 .a show that the st. dev. at the output of a chain is minimum when and that it is better observable when the si. dev. of the elements in the chain increase [bottom). This is true as long as the contribution to uou, of the two changing delays is significant [i.e., comparable to the contribution of the rest of delay elements -small values on y axis). When the contribution to an", of the two changing delays is very small (e.g., wire delay of wires of lengths 1. . and l3 are much smaller than the wire delay of wire I, and the gate delays), the plots become almost flat (for large y in Fig. 3 ) because the effect of length change of wires of length l2 and 1, would be absorbed by the computation of the overall uoour. When the delay elements are latest arrival delays at the inputs of a gate (Fig. 2.b) , and the contribution to uour of the two changing delays is significant, the minimum uou, is achieved when />=I3 only for large st. dev. (>9%) of all delay elements. For small st. dev. of all delay elements, the minimum aoa is achieved at the extremes when />=a, &=mm length or I>=mm length, 13=0.
. . Fig. 3 Standard deviation when one delay element increases and another decreases for a) a series of delay elements and h) delay elements as LATs gate inputs. The x axis represents the varying length and thcy axis is the ratio between the sum of means of fixed elements over the sum of varying means. Default st. dev. of all delay elements is 10% (top) or 25% (bottom) Observation 3: Plots in Fig. 3 suggest the following intuition to be used in a placement algorithm. To minimize the standard deviation of the gate output delay, the LATs at its inputs should be equalized ( Fig. 3.b) . Furthermore, the delays of the wires on a critical path should be equalized so that the output deviation is minimized ( Fig. 3.a) . Observation 3 implies that the placement method should be aware of statistical slack distributions.
-,

ROBUSTNESS ANALYSIS
To study the relationship between robustness and optimality, we adopt the methodology proposed hy Lopez et al in the context of optimization of interconnection network throughput [8] . The idea of a robust design methodology is to study the effect of factors on the performance and the interactions between such factors. The key point of such an analysis is to correctly identify and classify variables that affect the design performance within the modeling framework of the optimization problem. These variables are classified into two categories: controlled facrors and noise fuoclors. Controlled factors are design parameters that can be controlled directly or indirectly. Noise factors arc random effects that cause performance variation. In the case of the timing driven placement, the response variable is the overall circuit delay, which should be minimized.
In what follows, we focus our attcntion on a generic gate (Fig. I.b -top). For simplicity, we use the maximum difference between the latest arrival times at the inputs of the gate as a conrroifacror (i.e., the length of the range [min. max] within which all latest arrival times lie, called the inpar range). We choose input range because of three reasons. First, it would allow pursuing the same objective as in the case of predictability as illustrated in Fig. 3.b . Secondly, this factor can be indirectly controlled during placement. The placement algorithm can control the lengths of all nets along paths and thus perform a delay budgeting, which affects the latest arrival times at any point inside the circuit. Finally, the selection of this control factor was suggested by the results obtained by Hashimoto et al [9] and Bai et al [IO] . The noire faclor is chosen as the standard deviation of gate and wire delays'.
The first part of the experiment consists of selecting three different values for the control factor, simulating the model of the gate, generating groups of output samples for each of the three values of the control factor, and analyzing them. We constructed a toy-case using a three-input gate' where the control factor (i.e., input-range) can have one of the values (0, 0.5, I } for a gate with three inputs determined by the following three sets of LATs: {IO, IO,lO), (9.7,9.7,10.2), (9.5,lO,I0.5)'. The value of "0" for the input-range represents the case when the latest arrival times have equal means, therefore the input-range is ZETO, and so an. After the gate model is simulated, groups of 10000 samples of the output delay are generated for each value of the control factor. These groups are then analyzed using ANalysis Of VAriance' (ANOVA) method using Matlab [Ill. Fig. 4 .a shows the significance of the control factor.
The plot in this figure shows that the smallest mean (i.e., 12.73 shown on top of Fig. 4.a) for the gate delay is obtained when the input-range is 0.5. In this case, the set of latest arrival times (9.7,9.7,10.2) is, from a statistical perspective, better than the set {lO, lO,lO] . Note that static timing analysis would choose {10,10,10} as the best because it has a static delay of IO. Controlling the inpul-range as a control factor can be used in a placement algorithm to achieve a robust design that is less sensitive to delay variations.
' Other possible noise factors, not considered in this experimcnt include: correlations between lincs inside the circuit due to fanaut re-convergence, approximation of the density function of all gate and wire delays with the normal distribution and, input patterns applied at the primary inputs of the circuit, which may not be known during the design.
We performed similar analysts for gates with different number of inputs and similar results were obtained. We restrict our presentation lo the threeinput case for simplicity. ' The actual delay value, ;.e., IO, is not relevant. Only the range matters.
ANOVA is a well-known statistical method for studying the effect of control factors on the average value ofthc response variable, which in our case is the mean delay [XI [ I I] .
The second part of the experiment studies the interaction between the control and thc noise factors. Fig. 4 .b shows thc impact of the noise factor on the mean delay at the output of the gate for the three different values of the control factor. It can be seen that when the input rangc is 0.5, the slope of the output delay is smaller than the slope in the case when the input-range is 0. This means that the delay at the output increases at a higher rate when the LATs at the inputs are equal. Therefore, the gate is more robust to variations when the input-range is different from zero. In other words, if our modcling is based on static timing analysis, optimality (best cell delay is obtained when the input-range is zero) and robustness (cell is more robust for input-range 0.5 - Fig.  4 .b) are conflicting objectives. On the other hand, if our modeling is based on statistical timing analysis, optimality (middle box in Fig. 4 .a) and robustness (case 0.5 in Fig. 3.b) are non-competing objectives. Results in the above analysis are in agreement with those obtained in [9] and exploited for circuit slack optimization in [IO] . These results give us insight into the mechanisms, which determine a design to be more optimal or more robust and helps in identifying means for controlling them. It is often more costly to control causes of variations than to make a design process less sensitive to these variations.
CASE STUDY: PARTITIONING-BASED PLACEMENT
We now describe how we can develop means for modifying a standard partitioning-based placement algorithm in order to achieve more predictable and robust circuits. Based on the analyses presented in the previous sections, we propose a new netweight assignmcnt scheme that we integrated into a timing-driven partitioning-based placement tool. The goal is to change the behavior of thc placement such that the final placement solution is predictable and robust without sacrificing too much performance.
Our placemcnt tool is a modified version of Capo [IZ] . ,We developed our customized Capo placement algorithm by replacing the multi-lcvel and flat partitioning algorithms of Capo with a timing-driven version of hMetis, a leading partitioning algorithm [I] . Timing is minimized using timing criticalities (slack-based5) as net-weights inside the partitioning engine.
Two main observations are the basis for our motivation in the derivation of our new net-weight assignment scheme. The first observation is that the closer a wire is to the POs, cutting it is likely to have greater impact on the circuit delay, as it is more likely to lie on many different critical paths'. Second, st. dev. of the latest arrival times on timing paths decreases from PIS towards POs for large st. dev. for all wire and gate delays inside thc circuit. However, the decreasing trend is not maintained for small st. dev. That is shown in Fig. 5.a, which depicts In order to develop a methodology that minimizes output variations, we have to consider two factors: (I) how we can affect the deviation (the control factor discussed in Section 4), and (2) how effective our effort would be in minimizing the deviation at the output of a gate (Fig 5.a) . We can control the deviation by affecting how close are the latest arrival times at the inputs of a given gate. ' This observation was directly derived from our placement experiments, and not from the analyses in the previous seclions.
close as possible in order for the maximum operation to provide the smallest st. dev. (see Fig. 3 .b).
The delay of an input can be controlled by changing the bounding box of the net connected to it (in a partitioning-based placement algorithm, cutting a net at a higher level results in larger bounding box). Our new net-weight assignment scheme for large st. dev. is described by the following equation:
where, B, is a biasing factor to emphasize weights of nets driven by nodes close to the PIS (nodes with small logic depths). The classic slack-based net-weight component is wand W? is the "lat'' net-weight component, which is introduced to achieve LATs equalization at the inputs of gates (nodes) with small logic depths. Parameters a and , L? are used to put more weight on the "slack" or the "lat" weight components.
The biasing factor B will have values such that nets at large logic depths are cut more easily in order to capture the phenomenon of decrease of the standard deviation of the propagated LATs as described in the beginning of this section. At small logic depths (where the st. dev. of propagated LATs and the default st. dev. of gate and wire delays are large - Fig. 5 3 ) we put more emphasis on the "lat" net-weight component, which accounts for equalization of LATs in order to decrease the st. dev. as shown in Fig. 3 -bottom and to achieve robustness as described in Section 4).
SIMULATION EXPERIMENTS
We report simulation results obtained with the placement algorithm described in the previous section for a set of MCNCYI, IWLS93 [21] and 1TC9Y [22] (last two in Table 1 ) circuits in two different scenarios. In the first scenario after the circuit is placed we set all gate and wire delays as samples of the corresponding distributions. This case mimics the manufacturing process when gate and wire delays can vary due to process variations [3] . In the second scenario we model the increase of gate and wire delays due to temperature increase. We consider a pattem where gate and wire delays increase by 25%' at the center of the chip, by 5% at the boundaries, and y 10% elsewhere. Although the temperature pattem on a chip can be different and the increase of delay larger [18] [19], we consider a rather simple pattern for simplicity. The simulation results (average of ten different runs) are presented in Table 1 . ARer the placement is performed using the classic placement algorithm and our placement algorithm we compute the circuit delay denoted as Dday (static delay is computed using the Elniore delay for the lumped RC wire model and the half perimeter of the bounding box for the wire-length of a net) using the static timing analysis. Then, we model gate and wire delays as random variables and perform statistical timing analysis to obtain the Sf.
Dev. ( charactcrizes predictability while delay-change shows how robust the placement is. It can be seen that ;he standard deviation of the overall circuit delay is consistently smaller for placements obtained with our placement algorithm (the rather small differences are due to the convergence property described in Fig. Ya) , which means better predictability. The delay changes after noise injection are smaller for placements obtained with our placer, which means better robustness (34% and 8% on average in scenarios I and 2). A main benctit of better predictability and robustness and the same delay is thc improved manufacturing yield. The run time is not reported because both placement algorithms have almost the same run-time.
The run-time for the largest circuit is around 2 hours on a 1.5GHz CPU, 2GB memory running on Linux.
CONCLUSIONS
Analyses on the relation between predictability, robustness, and performance of VLSI circuits were presented. They served to finding novel mcans to change a standard timing-driven partitioning-based placement algorithm in order to design more predictable and robust circuits without sacrificing much in performance. Table 1 Comparison of the proposed placement algorithm to the classic net-based timing-driven partitioning-based placement. Deluy is the delay reported by a static timing analysis algorithm and SI. Dev. is the standard deviation o f the overall circuit delay after placement (the smaller it is the more predictable is the circuit). Delay-changc is the change i n static delay after the placement i s changed in scenarios 1 or 2 (smaller means circuit more robust).
