We propose a novel fine-grained scheme to compensate for within-die variations in dynamic logic to reduce the variation in leakage, delay and noise margin using both keeper control and body-biasing. We first show that the amount of compensation needed depends on the correlation that exists between gates, and then analytically show the possible reduction in the variance of the leakage and delay of both a single and multiple dynamic logic gates. We then design circuits to implement the system which provides the reduction in the variance of the leakage, delay and noise margin of dynamic logic gates and show that it produces a close match to the analytical results. In one design the variance of the leakage of 169 gates is reduced by 27% and the variance of the path delay is reduced by 39%.
INTRODUCTION
CMOS scaling has been driven by the desire for higher transistor densities and faster devices. Along with the continued CMOS scaling, down into the nanometer regime, has come increased process variations of circuit parameters such as the transistor channel length and transistor threshold voltage [1] . The increased process variations can have a significant effect on circuit performance and power [2] .
Historically, in order to cope with intrinsic variability, Integrated Circuit (IC) designers have implemented circuits with the worst-case process variations in mind [3] . However, designing at the worst-case process corner leads to excessive guard-banding, and thus more recent techniques have implemented adaptive circuit techniques by implementing control circuits on-chip that monitor the process variations within circuit devices, and change the characteristic of the devices [4, 5, 6] . These techniques, however, are usually implemented at the chip-level, or block-level.
One specific type of circuit topology that is extremely sensitive to process variations is dynamic logic [7] which is usually used in high-performance parts of microprocessors and other VLSI circuits [8] .
In the past, a weak keeper, which did not impact performance significantly, was sufficient to maintain the dynamic node voltage [8] . However with the exponential increase in leakage currents keepers must be made larger to offset for the worst-case leakage through the pull-down network, thus, reducing the performance advantage of dynamic gates over other circuit topologies. Also for dynamic logic, which can be sensitive to highly local variations, chip or block level techniques cannot provide the required compensation to adjust the leakage and performance.
An adaptive circuit technique that is particularly useful for dynamic logic is that of keeper control, where the keeper strength is changed to account for difference in process parameters [9] . [9] uses three keepers for each dynamic logic gate where each keeper can be turned off or on providing a variable keeper strength. However, since the technique use a digital scheme to determine the keeper strength there is a large overhead associated with producing and routing signals at a local level, and thus is performed at the chip level, and thus local process variations are not compensated.
We propose a fine-grained adaptive circuit technique that trades off noise-margin and/or leakage for performance postfabrication to reduce variability. The control scheme is referred to as fine-grained because it is done locally in a small neighborhood on the die, and because it is done using a continuous analog signal rather than a discrete digital signal. By reducing the variability, the keeper can be down-sized leading to increased performance.
The rest of this paper is organized as follows: Section 2 provides an overview of the dynamic compensation scheme. Section 3 then provides a framework for finding the optimal amount of compensation. The design of circuits to provide the compensation is presented in Section 4. Section 5 provides the results for the compensation scheme, and finally Section 6 concludes.
Technology
All simulation results reported in this paper are based on HSPICE, using Berkeley Predictive Technology Models (BPTM) [10] for a 70nm technology. The transistor models were expanded to include gate tunneling leakage which was modeled using a combination of four voltage-controlled current-sources, as in [11] . All simulations presented were performed on a four-input dynamic nor. The simulations were performed at 110 o C where leakage, delay, and noise margin are all more critical than at low temperatures.
OVERVIEW
To compensate for variations in the leakage, performance and noise margin of dynamic logic gates we will use both keeper control and body biasing to change the characteristics of the dynamic logic gate in response to underlying variations as shown in Figure 1 .
Body biasing, via controlled changes to V bs , can compensate for variations by changing the threshold voltage, Vtn, of the pull-down transistors [4] . Given that body bias changes the threshold voltage of the pull-down transistors directly, V bs should be able to compensate for all the effects that a varying V tn has on a dynamic gate including leakage, delay and noise margin at the same time, but, through simulation it was found that while applying a body bias, V bs = f (·), can virtually eliminate the variation in leakage, a different function V bs = f1(·) = f (·) is needed to compensate for variations in delay, and a further third function is needed to compensate for variations in noise margin.
To work around the need for different functions, keeper control, via controlled changes to V k , which has virtually no impact on leakage in a correctly functioning gate, will be used to compensate for variations in delay after body-bias is used to compensate for variations in leakage. While variations in noise margin will not be fully compensated for (since a third control signal will be needed), they will be reduced with the combination of body bias and keeper control.
We obtain V bs and V k from monitoring circuits which measure the process variations and produce a body bias and keeper voltage that provide the compensation. The monitoring circuits will be designed and layed out to look like an actual functioning dynamic gate, thus allowing systematic Within-Die (WID) variations within the monitor circuit to be correlated to those within the functioning gate.
The monitor circuits produce a change in V bs and V k based on the actual variations in that chip; we refer to the functional dependence of V bs and V k on the variations as transfer functions. In order to determine the transfer functions, the effect of V tn, Vtp, V bs and, V k on the leakage, delay, and noise margin of a dynamic gate was determined through simulation. Leakage has an exponential dependence on V tn and V bs but has very little dependence on Vtp and V k (the keeper is ON when the gate is not switching). The delay and noise margin of the dynamic gate both have a near linear dependence on V tn, Vtp, V bs and V k . Vtn has a stronger effect on both the delay and the noise margin compared to V tp.
OPTIMAL COMPENSATION
In this section we will provide the mathematical framework for determining the optimal compensation for reducing the variation in leakage and delay. To simplify the analysis, the effect of V tp will originally be ignored.
Variations normally have three components: a die-to-die component, a within-die systematic component, and a withindie random component. The term "systematic" refers to the parts of the variations which have some correlation across the die, while the "random" component refers to the parts of the variations that are totally independent. Compensation for die-to-die variations has been discussed in the literature. In this work, we focus on the within-die systematic variations. Compensation for random variations remains a topic of future work.
Threshold voltage variations have a within-die random component arising from random dopant fluctuations, and a within-die systematic component arising from systematic variations in length [12] (and, of course, from any systematic deliberately applied variations in body voltage that are introduced by our monitoring circuits).
Compensating for Leakage
Leakage has an exponential dependence on both Vtn and V bs (the dependency to Vtn is much stronger than the dependency to V bs ) and it can be written as
where ∆Vtn0 is the variation of the threshold voltage of the gate of interest and b I and aI are constants obtained through simulation; they are sensitivity coefficients. This last equation can also be written as ∆ ln I = ln(I/I nom) = bI ∆Vtn0 + a I ∆V bs . Let ∆Vtn1 be the variation in the threshold voltage in the MOSFETS of the monitoring circuit itself. If ∆V tn0 and ∆Vtn1 are totally correlated (i.e. ∆Vtn0 = ∆Vtn1), then it is clear that the transfer function that completely eliminates the variation in leakage is
∆Vtn1. We will call this transfer function the basic transfer function. As we will see below, when the correlation is not total, other transfer functions will be required, effectively providing less compensation than this basic transfer function.
If we assume that the distributions of ∆V tn0 and ∆Vtn1 are Gaussian with means 0 and variances σ bI ∆Vtn1 and its mean can easily be computed to be 0 and it's variance to be
where ρn0,n1 is the correlation between the dynamic gate and the monitor. Taking the above equation and differentiating with respect to a * I , it is found that there is a minimum at
Thus, depending on the correlation between the variations in threshold voltage in the monitor and the functioning gate, there is an optimal amount of under-compensation from the basic transfer function to minimize the variance of the log of the relative leakage of a gate. Since ln(·) is a monotonically increasing function, the value a/a * that minimizes the variance of ∆ ln I also minimizes the variance of I. When using the optimal amount of under-compensation, the variance log of the relative leakage becomes Var[
) which is always lower than than the variance of the uncompensated gate.
Compensating for Delay
The delay through a domino logic gate, D, has approximately a linear dependence on ∆V tn, V k and ∆V bs . Thus ∆D = b D ∆Vtn0 + aD∆V bs + cD∆V k , where bD, aD and c D are constants found through simulation. Since V bs is constructed to be V bs = f (∆Vtn1) = B∆Vtn1, where B is a constant found from analysis in Section 3.4.1 to minimize the variation in leakage, the change in delay becomes
Then, a second monitoring circuit is used to produce the needed keeper voltage, ∆V k = g(∆Vtn2) where Vtn2 is the variation in ∆V tn of the second monitoring circuit. Vtn2 is introduced separately from Vtn1, the variation in Vtn of the monitor used to compensate for variations in leakage, to keep the discussion of the impact of using monitors general.
If ∆V tn0, ∆Vtn1 and ∆Vtn2 are totally correlated, then the basic transfer function needed to eliminate the variation in ∆D is
. If we assume, as before, that the distributions of ∆Vtn0, ∆Vtn1 and ∆Vtn2 are Gaussian with mean 0 and variance σ When there is no compensation the the mean and variance of the delay can easily be found to be E[∆D] = 0 and
When there is compensation we will define ∆V k as
∆Vtn2 to allow for a discussion of how different transfer functions effect the distribution of delay after compensation. Thus when using compensation, the mean of ∆D can be shown to be 0 and its variance, which is not shown for clarity, can be minimized with respect to c *
As seen previously, when trying to minimize the variation of the leakage, if the correlation between the monitors and the functioning gate is less that 1, under-compensating results in a lower variance in the delay.
When using the optimal amount of under-compensation and if ρ n1,n2 approaches 1, and ρn0,n1 = ρn0,n2 then the variance of the delay becomes
which is very similar to equation for the variance of the leakage. Given that the two monitors will be placed very close together and will nearly have the same distance to the functioning gate, the above assumptions are warranted.
Considering Vtp variations
When introducing Vtp variations into the analysis for the monitor producing V bs , it is important that the monitor producing V bs should also be fairly insensitive to Vtp variations since the leakage of the dynamic gate is quite insensitive to V tp.
For the monitor producing V k , incorporating Vtp variations in our delay model necessitates a few changes. First the change in delay becomes ∆D = b Dn∆Vtn0 + bDp∆Vtp0 + aD∆V bs + cD∆V k . Also, since it is very difficult to make the monitor that produces V bs completely insensitive to Vtp, V bs becomes V bs = BnVtn1 +BpVtp1, where Bp would ideally be 0.
If ∆V tn0, ∆Vtn1 and ∆Vtn2 are totally correlated, and ∆V tp0, ∆Vtp1 and ∆Vtp2 are also totally correlated, the the appropriate transfer function for V k to eliminate the variation due to V tn and Vtp is
With this equation, the variance of the uncompensated and compensated system can be determined much like before, but the number of terms needed is quite large and unreadable. A similar characteristic arises where an increased under-compensation provides the lowest variance as the correlation between the monitor and the functioning gate decreased.
Optimum Compensation for Many Gates
When a monitor controls a group of gates, the undercompensation that provides the minimum variance for the total leakage and path delay must be determined.
Let the total leakage of a group of gates, I T , be defined as 
Equation 6 can now be minimized numerically in relation to a * I ; in this study we have made the following reasonable assumptions to simplify the computation 1 : 1. σ ni = σn for all i (i.e. the variance of the underlying V tn variations in all transistors is the same). 2. The correlation between the V tn variations approaches 1 as the distance between two transistors is lowered, and approaches 0 as the distance gets larger. 3. The monitor is placed at the centre of a square of gates. 4. All other gates i are placed around the monitor. For the variance of the path delay we follow a similar derivation. Let the total delay of a critical path, ∆D T , be defined as ∆D T = ∆D−M + ∆D−M+1 + · · · + ∆D0 + ∆D M −1 + ∆DM . The mean of ∆DT is 0, and its variance is
where σD i is the standard deviation of the delay of gate discussed above. To simplify the analysis of the above equation and compute the value of c * D that minimizes the variance of the delay for a path of 13 gates we use very similar simplifying assumptions to those used above to find the a * I that minimizes the total leakage. Now we replace ρ i,j in (6) and (7) to be fρ(d(i, j)) where f ρ(·) is the correlation function described in item 2 above 
Solving for the Optimal under-compensation
The final part in allowing for a numerical solution is choosing the function, f ρ. We have chosen to model the correlation function as f ρ(x) = e x 2 2S 2 δ 2 where x is the distance between the two logic gates in question, δ is a measure of the separation (or pitch) between two adjacent gates, and S is a measure of how quickly the correlation between gates decreases as the distance between them increases. Notice that f ρ(·) looks very much like the Gaussian distribution, but it is not being used as a distribution. For practical purposes, one can think of 3S as the largest distance for which correlation between of two transistors is not negligible. Again, the usefulness of analysis is not limited by using this specific function, but it allows us to obtain a numerical solution. Fig. 2 shows the optimum under-compensation needed and the corresponding reduction in total leakage for an area of gates for different number of gates controlled by the monitors for S = 3, S = 4 and S = 10 in f ρ (·) . If the number of gates is low, the compensation that minimizes the variance is near the basic compensation, and the variance of the leakage is almost eliminated compared to an uncompensated system. As the monitor controls more gates a reduced compensation is needed and the variance of the leakage rises, though always being lower than that of an uncompensated system. At 169 gates, which comprises a 13δ × 13δ area the optimal amount of under-compensation when S = 3 (S = 4, S = 10) is 0.32 (0.48, 0.88) where the variance of the total leakage is reduced by 27% (49.7%, 93.3%).
A similar analysis was performed for the variance of the delay, and is shown in Figure 3 . Similar to the leakage results, at low gate counts the optimal compensation is near the basic compensation where the variance of the delay is reduced considerably. As the gate count increases, the compensation needed is reduced, and the relative variance increases. The variance of the compensated system, however, never increases above that of the uncompensated system. For a path delay composed of 13 gates, the optimal amount of under-compensation when S = 3 (S = 4, S = 10) is 0.7 (0.9, 0.92) when the variance of the total delay is reduced by 46% (63%, 69.5%). 
MONITOR DESIGN
Now that the basic amount of compensation and the optimum under-compensation is known, the monitors can be designed to produce the required transfer functions. The transistor level design of the monitors must meet some requirements including (1) a similar topology to that of a dynamic gate to maximize correlations; (2) a transfer function equal or close to the required one for both V tn and Vtp (3) an output average level in the correct operating range (near 0V for V bs , near 0.5V for V k ); (3) a minimal amount of power consumption.
To find the monitor that provided the required transfer function for an area of 169 gates (13 × 13), a number of circuits were tested and one was found that met the requirements for V bs very well; the required compensation with respect to V tn is matched almost exactly, and there is very little variation in the monitor's output with changes in V tp. The average bias output by the circuit is a little lower that optimal, but the negative body bias produced reduces the average leakage of the gates, with very little performance impact as will be seen below. The requirements for the monitor and the resulting characteristics for the monitors can be seen in Table 1 .
For the monitor that produces the signal for V k , a circuit could not be found that met the requirements exactly; the chosen monitor has a transfer function that provides a compensation that is much lower than was is required with respect to both V tn and Vtp. While monitors were tested that had larger amounts of compensation and that more closely matched the requirement for compensation with respect to V tn, the monitors either had a compensation with respect to V tp that was much higher than required or the monitors had a average bias that was too close to V DD or VSS where the keeper would not function appropriately. The monitor that provided a lower amount of compensation than optimal was used since that was the more conservative choice. The transistor level schematic of the monitors are shown in Figure 4 . Both monitors have the same topology; the only difference is that the monitor producing V bs has a VSS of -0.5V. Since the monitor producing V k does not produce the optimum under-compensation required, there will be less of a reduction in the variance of the path delay for 13 gates; instead of reducing the variance by 46% the variance is reduced by 41%.
RESULTS
To validate that the designed monitor does provide the reduction in variance that was predicted by the analysis in Section 3, Monte-Carlo (MC) analysis was performed on the circuitry. The testbench consisted of one functioning gate and both monitors. The MC analysis was performed with different correlation coefficients between the functioning gates and the monitors. If the reduction in variance obtained matches closely the theoretical reduction in the variance of the leakage and delay of single gates at different correlation coefficients shown in Section 3, then the results for the reduced variance of the leakage and delay a group of gates, shown in Section 3.4.1 are validated without the need to do MC analysis with a large number of gates which would be computationally expensive. The MC analysis was originally done with no PMOS variations and no variations in between the different transistors within the pull-down component of an individual gate. Then PMOS variations where included, and finally variations between the different pulldown transistors within a single gate were added. Figure 5 shows the reduction in the variance of leakage when using compensation and compares it to the theoretical reduction in variance. The match is very close under the different MC simulation scenarios. When including the power drawn from the monitors providing V bs and V k , which is comparable to the leakage power of 21 and 24 dynamic gates respectively, the mean total leakage power will be reduced if more than 121 gates are controlled by the monitors since the mean leakage of a dynamic gate is reduced with the average negative body bias provided by the monitor. The worst-case leakage is reduced when the monitors con- Figure 6 also shows a good match to the theoretical reduction in the variance of delay. When PMOS variations are also included, the match to the theoretical curve gets slightly worse since the monitor does not have the precise required degree of under-compensation in regards to V tp. Finally, when the transistors within the pull-down are varying between themselves, the match is further degraded, this is due to the delay of the gate depending on the single pull-down transistor that is activated, as opposed to when compensating for leakage where the leakage and the monitor output depend on all pull-down transistors. When all variations are included, the design provides a 39% reduction in the variance of the path delay. Figure 7 shows the reduction in variance in noise margin compared to that of an uncompensated gate. At high correlation coefficients the variance of the noise margin is reduced by 74%, and at a correlation coefficient of 0 the variance there is no improvement.
All the analysis and simulations performed thus far have been at a high temperature of 110 o C since the leakage is higher, the delay longer and the noise margin lower at this temperature compared to lower temperatures.
As the temperature is decreased the functioning of the compensation system can qualitatively be thought of as an increase in V tn which decreases the leakage; thus the monitor transfer function tries to increase the leakage, and reduce the delay. It, however, does not exactly work as so since the transfer functions in the monitor is not purely a function of the subthreshold leakage.
After, performing the MC simulations at low temperatures the greatest change is the change in the mean of the leakage, delay and noise margin compared to the low temperature mean for the uncompensated gate. At 27 o C the leakage of the compensated gate is 21% larger than that of the uncompensated gate at low temperatures, but since it is only a tenth of the leakage of an uncompensated gate at high temperatures it is not much of issue. The delay of the compensated gate is faster by 5% than that of an uncompensated gate. The only concern is the noise margin which is decreased from high temperatures to low temperatures and is 5% lower than that of the uncompensated gate, but this is not a problem since the noise margin is still 7% larger than the noise margin of the uncompensated gate at high temperatures.
With only one monitor
To differentiate between the effect of compensation when using body bias and keeper control together to when using them separately, a set of simulations where performed where only body biasing was used. As before the MC simulations for the reduction in the variance of leakage showed almost the same results since keeper control has very little effect on leakage. However, when not using keeper control the change in the variance of delay and noise margin were different. Figure 8 shows the reduction in variance for both delay and noise margin when using body bias alone; the variance of delay was reduced by around 40% with high correlation and there was no reduction when the correlation was 0. For the noise margin, at high correlations there is a near 50% reduction in the variance and it the worst-case there is a 15% reduction in the variance. When used with multiple gates, a system with body bias alone would reduce the path delay of 13 gates by 33% (37%, 39%) when assuming S=3 (S=4, S=10).
Thus compared to using both keeper control and body biasing, body biasing provides a slightly lower amount of reduction in the variance of the delay as expected, but does provide better compensation for noise margin. Furthermore, the power overheads would be lower since only one monitor is used.
CONCLUSION
We have analytically shown the possible reduction in the variance of leakage of dynamic logic gates that are possible with compensation, and then designed circuits to implement the system. The designed circuits provide a reduction in the variance of the leakage, delay and noise margin of dynamic logic gates and provide a close match to the analytical results. In our design the variance of leakage is reduced by 27% and the variance of the path delay is reduced by 39%.
