I. INTRODUCTION
The mismatch model proposed by Pelgrom et al. [1] models the standard deviation in the mismatch of property P between two identical MOS transistors of width W and length L separated by a distance D (from center to center) in the layout, as
This model was experimentally verified by Pelgrom by measuring the mismatch on many dies, fabricated on many runs, and including many identical transistors per die, as well as many transistor sizes per die (see Fig. 2 in [1] ). The model was verified for different foundries and technologies. Theoretical derivations of this model (see Appendix A) reveal that the two terms originate by two different means. The size-dependent term is caused by random fluctuations of material and technological properties of transistors. For example, dopant concentrations along the wafers and dies can be considered to be modeled by a constant term around which there exist random fluctuations. These random fluctuations are averaged over a transistor area W L contributing to a specific transistor electrical property (like threshold voltage or beta). The larger the area, the smaller the impact of the random fluctuations. Therefore, this random-induced mismatch term decreases with the transistor area. On the other hand, the distance term in (1) , which is size independent (common for all sizes), is originated by gradients along the dies and wafers. For example, dopant concentrations follow smooth systematic surfaces along the wafers. Such surfaces are usually well known by IC manufacturers, although rarely made public. For a particular die, the surface can be approximated many times by a plane. Depending on the die position within the wafer, a different gradient plane will result. Consequently, in practice, this gradient plane has a random nature from die to die. Since Pelgrom measured many dies, his model includes the random characterization of these random gradient planes. Layout techniques, like common centroids, can eliminate the impact of gradient-induced mismatch and consequently eliminate the distance term from (1). However, layout techniques can never eliminate the random-induced size-dependent term because of its purely random nature.
In the next section, we describe how one can model both terms of the Pelgrom model in a circuit simulator. Then, in Section III, we explain another implementation based on nonphysical interpretations. In Section IV, we give some hints on how to further improve gradientinduced mismatch in CAD tools, and finally, in Section V, we give our conclusions.
II. IMPLEMENTING PELGROM'S MISMATCH MODEL
The random-induced size-dependent term is quite straightforward to add in a circuit simulation tool. First, one needs to know the critical transistor mismatch parameters, whose random fluctuations impact transistor currents. They are usually a small number of parameters. Pelgrom suggested two main ones (V T 0 and β) and a secondary one (γ) [1] . Bastos et al. added mobility degradation [2] , [3] , Serrano et al. suggested a total of five relevant parameters [4] , [5] , and recent models extending from weak to strong inversion have been proposed with no more than five parameters [6] - [8] . For more sophisticated transistor models like BSIM, about 16 parameters would be required [9] . Consider a generic mismatch relevant parameter P . Let us assume P mean is the mean value predicted by the manufacturer. Usually, the manufacturer also characterizes global interdie variations ∆P global common for all transistors in the same die, whose standard deviation σ(∆P global ) would characterize variations from die to die. Finally, a local term ∆P i has to be added for each transistor in the circuit, such that its standard deviation is characterized by (1) . This local mismatch term includes two components: a random size-dependent component ∆P rand_i and a gradient-induced size-independent component ∆P grad_i . The standard deviation of the random component is given by σ(
Here, W i and L i are width and length of transistor i. The factor 2 in the denominator accounts for the fact that each transistor deviates from a nominal mismatchless transistor. Parameters A P are provided by the manufacturer for the mismatch relevant parameters. Sometimes, correlations between these parameters are also characterized. In these cases, it is convenient to reflect them when generating the random numbers ∆P rand_i for each parameter [4] . Now, let us add the distance term of (1). Let us assume we have the layout of our circuit and we know the central coordinates of each transistor i in the layout (x i , y i ). Let us assume also that the manufacturer provides parameter S P of the distance term for parameter P in (1). Let us assume also that for each fabricated die, we can approximate the gradient of P along the die by a plane P (x, y) = Ax + By + C, where C = P mean + ∆P global , and A and B are random numbers. Consider now two transistors i and j located at coordinates (x i , y i ) and (x j , y j ) (see Fig. 1 ). The mismatch in property P caused by the gradient plane of the die is
Repeating this for many dies by generating random numbers A and B for each die, we can compute
Assuming symmetry of the random planes σ(A) = σ(B) (no preferred directions) results in
0278-0070/$25.00 © 2007 IEEE where D ij is the distance between transistors i and j. Comparing (2) with the distance term in (1) reveals that
Consequently, if the manufacturer provides parameter S P , we can generate the random gradient planes for each simulated die. Summarizing, according to the 1989 findings of Pelgrom [1] , a CAD tool implementing all these mismatch components of a MOS transistor should compute for each transistor i and for each of its mismatch relevant parameters P i a deviation including the following terms:
To compute ∆P rand_i , a random number needs to be generated for each transistor i. However, to compute ∆P grad_i , only two random numbers need to be computed for all NMOS transistors (and another two for all PMOS) in the same circuit, which are parameters A and B, characterized by (3). With these two random numbers, and the central coordinate of each transistor i, the term ∆P grad_i would be given by
Note that this way of interpreting Pelgrom's distance term cancels out gradient effects when using common-centroid layout techniques. Assume a differential pair where each transistor is split into two. Assume that transistors 1 and 4 in parallel form the first equivalent transistor of the differential pair, and transistors 2 and 3 in parallel form the second one. If they are laid out in a common-centroid configuration, their respective (central) coordinates can be written in the form
Applying (4) to this differential pair will result in a gradient-induced mismatch of ∆P grad_14−23 = (∆P grad_1 + ∆P grad_4 ) − (∆P grad_2 + ∆P grad_3 ) = 0, which is consistent with Pelgrom's distance term prediction for common-centroid layouts [1] [see (19) in Appendix A].
III. NONPHYSICAL INTERPRETATION
Shortly after Pelgrom's seminal paper was published, Michael et al. developed a means to implement the distance term in (1) [9] - [11] , which is called sigma-space analysis or design. This methodology has been further developed [12] . However, here, the distance term is not considered to model the statistics of the possible gradient planes from die to die. It is considered that within the same die, there is another random component per transistor ∆P d_i , such that when computing the difference between two transistors separated by a distance D ij , the standard deviation of this difference obeys
In principle, there could be many ways to generate random numbers for an arbitrary number of transistors in a circuit satisfying (5) . The way proposed in [9] would be as follows. Consider that there are N transistors. Then, for each of them, we compute
. . .
where R i is a random number normally distributed with zero mean and standard deviation equal to 1. Coefficients A ji are computed by using (5) for each transistor pair. Since there are a total of N (N − 1)/2 transistor pairs, there is a total of N (N − 1)/2 nonlinear quadratic equations to solve with a total of N (N − 1)/2 parameters A ji in (6) . If (5) were linear, there would be one unique solution. However, since they are nonlinear, we should expect many possible solutions. The computational cost of solving these equations grows exponentially with the number of transistors. Obviously, the random gradient plane solution discussed in Section II [see (4) ] should be one of the solutions of the formulation of (6), since both interpretations satisfy the distance statistics of (5). However, the computational cost of (3) and (4) is much smaller than for (5) and (6), and especially when there are a large number of transistors in the circuit. Surprisingly, a more detailed analysis of the solutions of (6) (see Appendix B) reveals that the gradient plane of Section II is the only possible solution.
IV. FURTHER REFINEMENTS FOR IMPLEMENTING GRADIENT-INDUCED MISMATCH IN A CAD TOOL
So far, we have seen that implementing Pelgrom's distancedependent mismatch through a single random gradient plane or through the formulation of (6) is equivalent, although the latter is much more expensive computationally. Consequently, both formulations model Pelgrom's distance term by gradient planes. This is correct as long as the gradients on the wafers have spatial constants much larger than die sizes. However, this assumption cannot always be taken as true. For example, in Fig. 2 , we show measurements of 50 NMOS transistors of size 5 × 5 µm 2 spaced evenly over a distance of 2.6 mm fabricated in a CMOS 0.35-µm process. We can very clearly see the random mismatch component from transistor to transistor [which is the one modeled by the size-dependent term in Pelgrom's model of (1)] and the gradient component. The central smooth curve in Fig. 2 was computed by fitting the experimental data to a third-order polynomial. Note that the gradient between transistors 20 to 50 fits very well a linear plane. Also, transistors 1 to 20 could eventually be fitted by a linear plane as well. However, all 50 transistors do not fit very well into a single gradient plane. This means that for this technology, dies over 2 mm (approximately) may have gradients that are not fitted well by planes. Dies of 2-mm size are usually small dies. In modern designs, it is very common to fabricate large dies of up to and over 1 cm 2 . However, it is also true that the analog part (and therefore, more mismatch sensitive part of the die) is usually a small fraction. Consequently, hopefully, this analog part could be modeled using gradient planes. Nonetheless, in modern submicrometer technologies, where mismatch effects are very strong, and digital circuits may need to consider its effect, it might not be a good approximation to consider a single gradient plane for the whole die. This would result in too pessimistic gradient-induced mismatch predictions. In this case, one might consider tiling the die into several gradient planes, with continuous transitions from plane to plane. Another option, if one has available wafer maps from the foundry, would be to place each die in random positions within the wafer. However, foundries rarely provide such data.
Another interesting observation regarding gradient mismatch modeling is the following. Looking carefully at the steps of Pelgrom's model theoretical derivation (Appendix A), at some point [between (16) and (17)], it is assumed that the spatial constant of the gradients (D w ) is much larger than the transistor sizes and the intertransistor distances. This is true if D w is of the order of wafer size. However, observations like the one in Fig. 2 reveal that this spatial constant could be much less (at least in the order of a few millimeters). This has two consequences. The first one is that if one estimates constant S P in (1) from its theoretical expression [see (18)], one should not use the wafer diameter for parameter D w but rather a length in the order of a few millimeters. The second consequence is that for transistor sizes or intertransistor distances approaching D w , one can no longer assume the approximations after (16) in Pelgrom's model theoretical derivation. This implies that we may no longer expect to decouple the random mismatch component from the gradient component into two independent terms, as in (1).
V. CONCLUSION
We have shown that the distance-dependent term (S 2 P D 2 ) in Pelgrom's mismatch model is equivalent, to consider for each die a random gradient plane Ax + By, such that σ(A) = σ(B) = S P . We have shown that the solution of the formulation known as sigmaspace analysis for predicting this mismatch term in a CAD tool is precisely this random plane. However, the mathematical formulation used is extremely complex and grows exponentially with the number of transistors, resulting in a nonviable solution for moderate and large circuits. However, the random-plane physical interpretation of this mismatch component results in a very simple mathematical formulation, very easy to implement in a CAD tool, and without almost any computing penalty. Hints on refining gradient-induced mismatch in a CAD tool are also given. APPENDIX A PELGROM MODEL DERIVATION Fig. 3 shows two transistors of size W × L located at coordinates (x 1 , y 1 ) and (x 2 , y 2 ), respectively. Let us define the position of the pair as its middle point x 12 = (x 1 + x 2 )/2, y 12 = (y 1 + y 2 )/2. Let us assume that property P of a transistor can be obtained by averaging a certain density function P over its area
The density function P(x , y ) is assumed to reflect wafer gradients as well as pure random components. Under these assumptions, the mismatch in property P for the transistor pair located at (x 12 , y 12 ) is given by
where G is "1" when (x, y) is inside the area of transistor centered at (x 1 , y 1 ), G is "−1" for the area of transistor centered at (x 2 , y 2 ), and G is "0" elsewhere. In general, G is a geometry function which depends on the specific layout of the transistor pair. Taking the Fourier transform in (8) yields ω y ) is the one of G(x, y), and P (ω x , ω y ) is the one of P(x, y). For the layout of Fig. 3 , it can be shown that If
For a pair of transistors in a common-centroid configuration, as shown in Fig. 4 , it would be
Fig . 5 shows a wafer in which contour lines of constant property P have been drawn. In the wafer, at coordinate (x 12 , y 12 ), a pair of transistors is drawn. Assuming that when averaging ∆P (x 12 , y 12 ) all over the wafer we have ∆P | Wafer ≈ 0, we can write that
where Ω is the area of the wafer. Applying Poisson's theorem to (12) results in
Let us now make the following assumption:
where P o is a constant (frequency independent) representative of white noise and W(ω x , ω y ) is a wafer map component w , where D w is the wafer diameter. Therefore, function W(ω x , ω y ) can be assumed to have a shape of the type shown in Fig. 6 , and consequently, we can assume that
Therefore, (13) can be written as
Assuming a transistor pair as in Fig. 3 , it would be
where
For a common-centroid configuration, by changing the geometry function, it can be shown that the result is
Note that the distance term has been reduced an amount of the order (D y /D w ) 2 , which is very small. Consequently, from a practical point of view, the distance term can be considered to have disappeared. Perfect gradient planes would cancel out exactly this distance term. However, for the model in Fig. 6 , the gradients are not always perfect planes and might have a higher order curvature component. This is why in (19), there results a residual distance term.
APPENDIX B
Let us demonstrate that the solution to the set of (5) and (6) is exactly the random gradient plane described Section II. First, we will compute the solution for a three-transistor circuit. Then, we will show that if it is verified for a circuit with N − 1 transistors, it will be verified also by a circuit with N transistors. Therefore, it is true for any number of transistors.
Let us start with a circuit of three transistors and consider a generic mismatch sensitive parameter P . According to the formulation of (6)
Let us choose, without any loss of generality, a coordinate system such that the coordinates of the three transistors are (0, 0), (x 2 , 0), and (x 3 , y 3 ), respectively. Applying (5) and (20) to these three transistors results in
The solutions of this set of equations are
where s 1 = ±1 and s 2 = ±1. Consequently, there are four possible solutions, depending on the signs s 1 and s 2 (parameter S p is supposed to be always positive). Substituting (22) into (20) results in
with A = s 1 S p R 2 and B = s 2 S p R 3 . Note that (23) defines all ∆P d_i on a random plane defined by the two random parameters A and B, such that σ(A) = σ(B) = S p . Consequently, the distance mismatch solution of (5) and (6) 4 , and again, the distance mismatch solution of (5) and (6) lies on a random plane for all transistors. Let us now consider a circuit with N − 1 transistors, and let us assume that the solution also lies on a random plane. This means that A n2 = s 1 S p x n A n3 = s 2 S p y n A nk = 0, k= 4, . . . , n − 1
for n = 4, . . . , N − 1. If we now consider the general case of a circuit with N transistors, each at coordinate (x n , y n ), then following the formulation of (6) 24) is also satisfied for n = N , and again, the distance mismatch solution of (5) and (6) lies on a random plane for all transistors.
