Abstract-In this paper we address both empirically and theoretically the impact of an advanced manufacturing phenomenon on the performance of high-speed digital circuits. Using data collected from an actual state-of-the-art fabrication facility, we conducted a comprehensive characterization of an advanced 0.18-m CMOS process. The measured data revealed a significant systematic, rather than random spatial intrachip variability of MOS gate length, leading to large circuit path delay variation. The delay of the critical path of a combinational logic block varies by as much as 17%, and the global skew is increased by 8%. Thus, a significant timing error and performance loss takes place if variability is not properly addressed. We derive a model, which allows estimating performance degradation for the given circuit and process parameters. We demonstrate explicitly that intrachip gate variation has a significant detrimental impact on the overall circuit performance, shifting the entire distribution of clock frequencies toward slower values. This is in striking contrast to the impact of interchip gate variation, traditionally considered in statistical circuit analysis, which leads to the variation of chip clock frequencies around the average value. Moreover, analysis shows that the spatial, rather than proximity-dependent systematic gate variability, is the main cause of large circuit speed degradation. The degradation is worse for the circuits with a larger number of critical paths and shorter average logic depth. We propose a location-dependent timing analysis methodology that allows mitigation of the detrimental effects of gate variability and have developed a tool linking the layout-dependent spatial information to circuit analysis. We discuss the details of practical implementation of the methodology, and provide guidelines for managing design complexity.
Impact of Spatial Intrachip Gate Length Variability on the Performance of High-Speed Digital Circuits accurate modeling of is thus of utmost importance for accurate modeling and design of ICs.
Deep submicron technologies, however, exhibit a new variability pattern, which is not addressed by the previously developed models and methods, i.e., the systematic spatial intrachip variability. As a result, the printed transistors display a distinct spatial map, making their characteristics dependent on the location within the chip. This variation is mainly caused by the stepper-induced illumination and imaging nonuniformity due to lens aberrations, which are worst near the optical resolution limit [4] , [5] . Because the continuing scaling of semiconductor processes, following Moore's Law, forces us to operate closer to the optical resolution limit of stepper systems, the intrachip variability will only increase. In this work we collected data from a state-of-the-art fabrication facility to study the complex interactions of design and manufacturing. We found a significant spatial variation of the circuit timing properties that lead to degradation of the overall circuit speed, if not properly addressed at the design stage. We provide a novel analytical framework that allows estimation of performance degradation for the given circuit and process parameters. The systematic nature of intrachip variability makes previously used approaches to statistical circuit analysis, such as worst case analysis, insufficient and inaccurate. Instead, we propose a new approach that makes the device characteristics dependent on their location within the chip. By accurately predicting the spatial dependence of circuit characteristics, the detrimental effect of intrachip variability can be substantially reduced.
The paper is organized as follows. In Sections II and III, we describe the experimental procedure for accurate characterization of systematic variability and circuit performance simulation results. In Section IV we present a set of analytical models for evaluating the impact of variation on circuit performance. Section V discusses the methodology for location-dependent timing analysis. Section VI discusses the relation between location-dependent timing and traditional worst case analysis.
II. EXPERIMENTAL CHARACTERIZATION OF SYSTEMATIC INTRACHIP VARIABILITY
We performed a comprehensive, silicon-based experimental characterization of an advanced production 0.18-m logic process technology with the goal of capturing all the relevant variability patterns. One of the most important aspects of the characterization was to address the possible interaction 0278-0070/02$17.00 © 2002 IEEE Fig. 1 . Spatial profiles depend on proximity effects. We categorize all gates according to their local layout patterns. Vertical gates are labeled V XY , where X is the distance to the nearest neighbor on the left and Y is the distance to the nearest neighbor on the right. Three categories of distance are used: dense, denso, and isolated. Dense refers to the situation where the distance to the nearest gate is equal to the minimum design rule. Isolated spacing corresponds to the situation where no other polysilicon line is within a specified distance. Denso is the intermediate category. Similarly, horizontal gates are labeled HXY , where X is the distance to the nearest neighbor above the gate and Y is the distance to the nearest neighbor below the gate.
between the global lens aberration and the local layout pattern-dependent nonuniformities due to the optical proximity effect. Toward this end, we classified all the gates into 18 categories depending on their orientation in the layout (vertical or horizontal) and the spacing to the nearest neighboring gate (Fig. 1 ). To capture a particular lens aberration, the coma effect, we also distinguished the relative position of the surrounding gates, i.e., the neighbor being on the left versus the neighbor being on the right.
In order to characterize the spatial profile, a test-chip was used which contained a 5 5 grid of test modules located across the area of the reticle field. Each module contained long and narrow polysilicon resistors, with a variety of distances to adjacent polysilicon lines. The polysilicon resistors were manufactured with the same process steps as polysilicon gates, including poly CVD, resist coating, exposure, development, and gate definition by plasma etching. One possible source of variability is the variation in the sheet resistance across the reticle field that would confound the measurement results. In order to eliminate this component, each module contained a test structure to calibrate the sheet resistance. Another possible source of variation is the silicide resistance, which is known to cause a large standard deviation in the sheet resistance of thin lines. In order to avoid this source of variation, the polysilicon resistors were not silicided. Finally, a third source of variation would be due to variability in the widths of lines on the test chip mask. This component of variation could not be eliminated and is confounded with our measurement results. However, we believe that this component is small.
Data was collected by measuring the resistance of the polysilicon lines.
values were then computed. The data set came from measuring 18 test modules coming from three distinct wafers, thus allowing tests for statistical significance. Because the observed spatial and proximity-dependent variability is determined by the combined effect of the stepper and the lens, the same stepper-lens system was used for all the measurements. The spatial maps were measured separately for each gate category (Fig. 2) . The range of variation of the surface is 8-12% depending on the category, with a mean of 10.2%. Statistical -tests verified that the generated spatial maps of variation over the chip are statistically significant, i.e., that the level of systematic variation is large in comparison with the random noise. It was also found that the maps for different categories have quite distinct spatial behaviors, due to the interaction between global lens aberrations and the pattern-dependent optical proximity effect. Therefore, any accurate approach to timing analysis must consider variation of the mean , not limiting itself to the assumption of purely random variation. Also, at least for some gate categories, the distinct spatial maps have to be used in the course of timing analysis.
III. SIMULATION OF THE IMPACT OF SYSTEMATIC INTRACHIP VARIABILITY ON CIRCUIT PERFORMANCE
The presence of systematic spatial variation significantly impacts the timing and, even functional, properties of integrated circuits. In this section we describe a tool capable of incorporating the spatial information into the verification flow. We use it to study the impact of systematic intrachip variation on design and discuss the implementation details.
Incorporation of systematic information into timing verification requires making the device properties dependent on the device's location within the chip. For this we developed SpaceTimer, a tool with the following functionality (Fig. 3) . A netlist is first extracted from the original circuit layout. The layout and the netlist are then passed to SpaceTimer that classifies each gate as belonging to a particular category and determines the spatial location of the particular gate within the layout (chip). Using this information together with the set of maps produced at the stage of characterization, SpaceTimer generates a modified netlist in which each gate has a proper location-dependent value and simulates the critical paths using a circuit simulator. The simulator can be either dynamic (SPICE) or a static timing simulator. The most direct impact of spatial variation is the resulting variation of CMOS gate delay. In fact, because of the nonlinear delay versus relationship ( ), variation of circuit speed is larger than variation. We evaluate speed variation by analyzing a 151-stage NAND ring oscillator (RO), often used as a predictor of chip performance. To achieve the highest accuracy, a SPICE simulator is used in this case to generate a spatial RO frequency map [6] . Results showed that the ring oscillator frequency map is consistent with the spatial patterns of variation (Fig. 2) ; the frequency is highest in the center of the chip, where is minimal. A comparison with the measurements was made for four ring oscillators available for test within each chip, and a good agreement was observed confirming the accuracy of the simulation result. The range of variation in RO speed across the chip is 14.5% (Fig. 4) .
Such a large variation in device performance also strongly affects the timing behavior of critical paths in the design. We simulated the timing behavior of a benchmark combinational circuit from ISCAS'85 [7] , containing 1764 CMOS devices, using a static timing simulator PathMill from Synopsis [8] . Analysis was done for nine spatial locations on the reticle field in a uniform 3 3 grid. For the chip 4 located in the lower right quadrant, the delay of the same path placed at different corners (fast and slow) of the chip varied from ps to ps, a 16% difference. The variance of the path delay distribution is also different: (Fig. 5) . Thus, circuit paths with identical designed-for delays will, in reality, have considerably different delay distributions, depending on the physical location of the path within the chip. As a result, the overall critical path delay distribution is broadened around the designed-for delay, with some slower and some faster paths. (The consequences of this effect will be discussed later in the paper.)
Importantly, the order of critical paths also changes depending on location of the combinational block within the chip. Let us consider the extreme case and compare the top 20 critical paths of the above benchmark ISCAS circuit associated with the spatial points giving the fastest and the slowest path delays: the locations with the smallest and largest . The top 20 critical paths in the fast corner of the chip belong to the set and are denoted , . Similarly, the top 20 critical paths in the slow corner of the chip belong to the set and are denoted , . The comparison shows that only six of the paths found in can also be found in . In particular, the paths , , become the paths , ,
. This regrouping significantly complicates the use of predesigned and precharacterized circuit blocks physically localized within the chip, such as hard intellectual property (IP) blocks, since their precharacterized behavior will not adequately correspond to the location-dependent . The systematic across-chip variation also affects the global circuit properties, such as clock skew in clock distribution networks containing buffers for driving and restoring the signal. Control of clock skew is critical, since in determining a conservative clock cycle time, a percentage delay due to clock skew is additive to the setup times and hold times of the circuitry. We considered clock skew of a global clock network, distributed using the popular -tree scheme. The basic intent of such a clock network is to equalize the arrival times of all of the clock signals to the output loads; thus, skew then is defined as the maximum difference between any of the clock arrival times. Let be the delay of the clock from the central buffer to the output node , and define the skew to be , , where and are the minimum and maximum delay values for the 16 output nodes. A simulation sets ps and ps. Using these values we find that the maximum systematic skew, for the chip in the upper left quadrant of the field, is ps. This is 8% of the total clock cycle. The minimum clock skew on this chip is found to be 47 ps, which is close to 5% of the clock cycle (Fig. 6) .
In general, the amount of clock skew introduced by systematic variation will depend on the clock tree design and the size of the chip. Clearly, for the popular -tree network, the clock skew increases as a function of the size of the chip.
IV. ANALYTICAL MODEL OF THE CIRCUIT SPEED DEGRADATION DUE TO VARIATION
We now develop a theoretical framework that allows explicit studying of the impact of intrachip gate length variation on complex VLSI circuits in mass production. We show that intrachip variation has a significant detrimental effect on the overall circuit performance, shifting the entire distribution of clock frequencies toward slower values. In contrast, interchip variation, traditionally considered in statistical circuit analysis, leads to variation of chip clock frequencies around the average value.
A. Path Delay Variation Due to Intrachip Variation
We start the analysis by introducing a statistical model of intrachip variation that decomposes the overall variation into three distinct components: the proximity-dependent, the spatial, and the random residual (1) In this model, is the overall mean. The proximity-dependent term is modeled by a discrete random variable. Its distribution is determined by the frequency of each gate category in the layout and is found on a circuit-by-circuit basis through empirical analysis of the layout. The second term, , corresponds to the spatial variation component. In this section, our ultimate goal is to describe path delay variation, so we are primarily interested in the analysis of variance, rather than behavior of its mean at a particular position within the chip. For that reason we can approximate by a random distribution. We model by a normal distribution because empirical analysis shows that the assumption of normality is justified. The random residual component is also modeled by a normal distribution . The clock frequency, or alternatively, the clock cycle, at which a circuit can be operated, is determined by the delay of the slowest path in the circuit (2) Thus, in order to assess the impact of variation on clock frequency, we must ultimately link the variation of to the variation of path delays. We start the analysis by noting that path delay is a sum of the delays of the individual gates. Delay of an individual CMOS gate can be calculated using the standard compact gate delay model [9] (3) where and are the drain currents of NMOS and PMOS transistors, is supply voltage, , and is load capacitance. For deep submicron MOS devices, the saturation current may be described by the universal empirical equation [9] (4)
We can simplify the analysis by assuming that the parasitic junction capacitance is small and , so that . Combining the above equations, the delay of an individual gate is given by . Alternatively, we can write (5) where is a lumped process-specific constant. Note that because we are considering the delay of a gate, which is a part of the gate chain, (3) represents the delay of one gate (driver) driving another gate (load). The load capacitance stands for the input capacitance of the load gate. Then, (5) can also be written as:
. We can extend this analysis to a path consisting of gates. Denoting the delay of the th gate stage as , the delay of the entire path is given by (6) The next step in the analysis is to find the variance of path delay. Using (6), the path delay variance is as follows: (7) We can find the variance of path delay using the statistical -method. The method is based upon a series expansion of a function around the nominal point. We use a first-order expansion of the delay function distinguishing the responses of delay to variation of the driver gate (gate ) and the load gate (gate ). Denoting the nominal delay by and the nominal by , we can write (8) From (5), the derivative is given by (9) It is useful to distinguish the contributions of pull-up ( ) and pull-down ( ) networks within a logical gate stage. Then, (9) becomes (10) In this equation, is still referring to the gate length delta of the equivalent transistor network. Ultimately, we need to relate the delay derivation to of individual polysilicon lines and to the statistical model given by (1) . However, since it is easier to carry out the delay analysis in terms of equivalent transistor network, rather than individual transistors, we need to formulate the statistical model of equivalent gate length from the statistical transistor model of (1) .
In order to properly model the proximity-dependent component of variability, we construct the critical path in such a way that it represents the actual distribution of gate categories in the layout, as found by its empirical analysis. First, the critical path has to contain noninverter logic gates, in order to represent the transistors belonging to different gate categories. The delay analysis of such logic gates is similar to that for the basic inverter with the exception that we describe the parallel or series transistor connection by the equivalent . Then, delay of a complex gate can also be accurately described by (3) . Second, in order to recreate the gate category distribution found in the actual layout (Table I) , we need to come up with a corresponding distribution of logic gates in the critical path. One feature of our critical path model is that the gates comprising both the pull-up and pull-down networks in a CMOS design belong to the same gate categories. (For example, within a NAND2 gate, the pull-up and pull-down networks are laid out as a V53-V35 transistor pair.) This is in fact consistent with many industrial layouts. The average fan-in of the gates in the critical path is 2, which is also typical for standard CMOS designs [10] . Table II describes the distribution of logic gates in the critical path.
With respect to the spatial component of , we assume that the gate chain is spatially confined to a relatively small region of the chip that is significantly smaller than the range of spatial variation. That means that for all the gates within the critical path:
. This is a reasonable assumption, since it is good design practice to break up long paths.
Finally, the random residual component is spatially uncorrelated:
, i.e., the values of this term for any two polysilicon lines within layout are uncorrelated. Thus, when using the equivalent CMOS gate to describe the pull-up and pull-down networks (as discussed above), we can model the contribution of by the averaged residual term, dependent on the fan-in of the gate. (We again use the observation that the average fan-in across multiple layouts is close to 2.) To simplify the algebra, we will use to represent the averaged random residual; given that , the distribution of is . Let us rewrite as (11) Then, substituting (11) into (10), the delay of gate stage can be represented as (12) We now use the assumptions about the statistical properties of the various variation components that we established above to simplify (12) . Because of the assumed spatial confinement of the gates in the path, . And, because of the assumed gate category composition of the transistor networks, . Following (6), we now sum up the individual delay terms to get the total path delay (13) We can now find the variance of the full path delay, which is found by summing up the variances of the terms in (13) . The variance of the first constant term is zero. The variance of the other terms is found using the standard statistical equations (e.g., if is a constant term, and is a random variable, then ) For large , and using , we can simplify the expression for path delay variance to (14) Equation (14) can also be modified to describe the situation in which the path is only partially localized. Let be the number of spatial partitions of all the gates in the path, and let be the number of gates in the th partition, so that . Starting the analysis on (12), it is then straightforward to show that the path delay variance contributed by the spatial component of variation is given by . Then, (14) can be rewritten as
In the next section and beyond, we assume, however, that the combinational logic gate path is spatially localized and use (14) to predict the amount of circuit speed degradation due to the variance of path delay.
B. Clock-Cycle Degradation Due to Path Delay Variation
Let us consider a high-speed digital chip being manufactured in volume production. Complex high-performance silicon chips are designed in such a way that there are a large number of paths with delays close to the maximum designed-for delay . At the very least, this is the intention of a circuit designer when he performs timing analysis and adjustment of his design. Thus, when no variability is present, the distribution of circuit path delays have a sharp peak bounded by (Fig. 7) . Let us now analyze the space of path delays when a significant variability is superimposed on the "ideal" timing simulation conditions. The impact of interchip variation can be approximated by a shift of every path delay toward either slower or faster speeds so that the entire path delay distribution shifts. In contrast, as we saw in Section IV-A, intrachip (spatial and proximity-dependent) variability leads to path delay variation around . This happens because, due to their spatial location and composition, some paths will become slower while others become faster (Fig. 7) . Now, let be the set of all circuit paths and let denote the delay of the path in the manufactured chip . Then, the clock period for chip is (15) This equation becomes key to understanding the difference in impact of intra-and interchip variability on clock period. As we will demonstrate by the analysis below, interchip variation leads to the variation of chip clock frequencies around the average value . In stark contrast, because the clock period is always defined by the maximum chip delay, intrachip variation and its resultant path delay variance, force the maximum path delay to be greater than . Thus, the average clock period is uniformly increased, degrading the overall circuit speed (Fig. 8) .
We now derive a set of analytical models that allow predicting the clock period degradation due to intrachip variability. Let be the number of paths with delay close to and let be the maximum delay for chip , if there were no intrachip variation, i.e., if
. The path delays are random variables, Fig. 8 . In contrast to between-field Lgate, variability intrafield variation component degrades average delay, shifting the whole distribution. and for tractability of analysis, we approximate their distribution as multivariate normal with the diagonal covariance matrix (The Monte Carlo analysis, described below, confirms the validity of this approximation.)
Instead of finding analytically, we estimate its expected value as the expected value of the maximum of normally distributed random path delays. The number of trials required, on average, for an event of probability to happen is . Theorem 1: For a deviation factor , let . The expected clock period for chip is
In other words, intrachip path delay variation causes the clock period for chip to deviate (on average) by from the chip's designed-for maximum. We can find the expected value, , of the clock period across the multiple chips if we take into account interchip variability which can be modeled by a normal distribution . Theorem 2: For chips, let (17) In other words, across all the chips, intrachip path delay variation causes the clock period to deviate (on average) by from the designed-for critical path delay . For example, for , , and . Table III gives the values of for several different values of , and other values can be found from the table of the normal distribution.
Theorems 1 and 2 clearly show that the interchip variation component leads to variation of chip's critical path delay around the designed-for critical path delay while the presence of intrachip variation degrades the average circuit delay. Let us now compare the impact of both inter-and intrachip variation components on the clock period. Theorem 3: The overall deviation of the actual critical path delay from the designed-for value is
The first term is the variance of due to interchip variation and by analogy with (14) can be shown to be . The second term in (18) is the shift of the average and is given by Theorem 2. Then
This expression allows estimating the relative magnitude of the degradation of the average circuit speed compared to the variation around the average value. For example, if , so that , the squared deviation of the average speed from the designed-for speed is 1.7 greater than the random variation around . Clearly, the effect intrachip variability on circuit performance is very significant.
A Monte Carlo simulation was performed for model verification by generating a number of random vectors following the specified distribution of and . We calculated the path delay for each of the vectors and compared the resulting (17)] also appears to be very accurate; the average error of prediction is only 1.2%.
We evaluated the impact of variation on circuit performance using the measured characteristics of the production 0.18 m CMOS process (Section II) for different values of the model variables. To study the potential gains of the location-dependent timing analysis methodology, we considered the reduction of (Fig. 9 ). Both the Monte Carlo simulation and the model predict an up to 20% degradation of the average circuit speed as a result of intrachip variation. Speed degradation is worse for more complex chips, since they contain more critical paths (larger ) and for shorter paths (smaller ). We also studied the relative sensitivity of speed degradation to the two contributors to intrachip variability, the proximity-dependent (
) and the spatially-dependent ( ) components. The partial sensitivities were characterized by assessing the reduction in , and a corresponding reduction in , as a result of a change in the variance of and . We found that spatial intrachip variation has a much stronger effect on degradation of circuit speed than proximity-dependent variation. This is because the averaging of of the gate stages within the path reduces the delay variation due to the proximity effect. The result implies that from the perspective of improving circuit speed, much more attention should be paid to improving the spatial intrachip uniformity rather than reducing the proximity-dependent variability.
V. PRACTICAL IMPLEMENTATION OF LOCATION-DEPENDENT TIMING ANALYSIS
The analysis of the previous section showed that systematic intrachip variability has a large detrimental effect on the overall (average) circuit speed. These negative effects may be reduced by location-dependent circuit analysis which takes the systematic variation into account. While this may bring much benefit, the practical implementation of location-dependent circuit analysis faces several difficulties. One difficulty is that the systematic, and thus correctable, pattern of spatial variation is specific to a unique combination of the stepper and the lens. Therefore, a correction approach would have to couple the modification of a layout to a specific system. The other important complication is that the proper unit of analysis of the systematic and repeatable spatial profile is the reticle field of a photolithographic stepper machine. National Technology Roadmap for Semiconductors 1997 projects that for microprocessor products, the number of chips per reticle field will be 2-4 [11] . Hence, for chips per reticle field, we have to keep track of distinct designs and optimize them individually, since each chip corresponding to a unique position within the reticle field will have different critical paths and timing properties. Consequently, the highest performance possible is only achievable when chips in each position are optimized individually. But this is expensive and can be justified only for high-end designs.
An alternative approach is to give up optimality in exchange for simplicity of working with a single design. This may be achieved by using a location-dependent timing analysis based on the combined map. If is the number of chips per reticle and is the map for chip , the combined is (20) This guarantees that the timing analysis based on is properly conservative, e.g., that the predicted clock period . (Note that such "collapsing" is still more accurate than "standard" timing analysis, which does not consider location-dependent circuit timing properties. Indeed, unless there is no variation, there must exist a point for which , and . Then, , and is not an accurate clock cycle estimate.) If the clock cycle has to be set by a rigidly designed clock generator, based on a fixed estimation of the circuit's critical path, than such "collapsing" leads to supoptimal performance since any clock-cycle time in excess of the actual (chip-specific) critical path is a direct performance loss. Specifically, the performance loss (performance loss for chip )
, and . For the benchmark combinational circuit c499, for four chips/reticle, the critical path delays are (in ps) 1190, 1330, 1280, 1380 . Collapsing gives ps and %. The maximum performance loss, however, is %! In the clock-skew example, combining the four skew maps into the overall skew map (Fig. 10) raises the maximum skew to 88 ps. Thus, for the first chip (skew is 75 ps), the performance loss is 17%. For the given performance loss and potential gains, a suitable approach may be chosen to balance the tradeoff between higher performance (multiple designs) and simplicity ("collapsing"). It is important to clarify the relation between the locationdependent delay variability analysis, which we considered so far, and the traditional worst case timing analysis, since both deal with deviations from the "nominal" case. In particular, we show that the systematic variation cannot be accurately modeled by traditional statistical methods.
The goal of statistical circuit timing analysis is to determine the probability density function (pdf) of the circuit delay, or, equivalently . In most cases, however, a tacit assumption is made that the mean is known a priori, and one is concerned only with finding the delay variance. This is the approach taken by the widely used statistical method, worst case analysis [12] , [13] . In contrast, the location-dependent timing analysis is concerned with variation of the mean timing properties of the circuit as a function of position.
Despite the formal differences between the two approaches, one could argue that it is possible to get an accurate prediction of the statistical circuit behavior using the traditional worst case analysis if only the value of sigma properly included the spatial and proximity-dependent variability. This is so because systematic variability can always be absorbed in the random variation component. Simulations show, however, that a significant prediction error is likely to occur.
The traditional statistical worst case analysis assumes the following statistical model:
, where is the overall mean and absorbs all of the random variation. Then, the worst case value is . For the location-dependent timing analysis, both the mean and variance are proximity-dependent, and the mean is location-dependent:
, where is the spatial proximity-dependent map. In this case, the worst case value is .
We compared the accuracy of the above modeling approaches through statistical worst case simulations of the benchmark combinational circuit. The circuit was simulated using worst case values, assuming placement at two locations within the chip: a center point and a corner. The results (Table IV) suggest that such worst case analysis is overly pessimistic, at least for certain spatial locations. Thus, in predicting the worst case behavior of a circuit when it is located in the center of the chip, the traditional worst case analysis gives an error of 11%. This is a significant error for designs with tight timing constraints.
VII. CONCLUSION
In this paper, we demonstrated, using experimental evidence gathered from a state-of-the art 0.18-m fabrication facility, the presence of significant systematic intrachip variability. This variability causes an error of up to 17% in timing analysis of critical paths, resulting in a corresponding performance loss. The variability also leads to increased global skew of about 8%, which is additive to the setup time error.
We developed a theoretical framework allowing explicit analysis of circuit speed degradation due to intrachip variability. Analysis shows that the spatial, rather than proximitydependent, systematic variability is the main cause of large circuit speed degradation. The degradation is worse for circuits with a larger number of critical paths and shorter average logic depth.
We proposed a location-dependent timing analysis methodology that allows the analysis and mitigation of the detrimental effects of variability and have developed a tool linking the layout-dependent spatial information to circuit analysis. We showed that the proposed methodology cannot be subsumed by a statistical, e.g., worst case, timing analysis. In a situation of multiple chips per reticle field, one can either treat the problem as a multidesign problem (high-performance) or "collapse" timing information into a single set (simplicity).
