Abstract-Process variability and environmental fluctuations deeply affect the digital circuits performance in many different ways, one of them, the data processing time which may cause error on synchronous digital circuits due to underestimated time violations. This situation is commonly avoided adding time margins to the clock signal making it larger than nominal worstcase data process time, penalizing the global performance. In this paper a new mechanism for compensating both environmental fluctuations and process parameters variations effects on digital circuits is presented. The environmental compensation mechanism regenerates the clock signal for a pipelined system stages adding a compensated skew component depending on the local environmental conditions of every one of these stages. The process variations are corrected with a calibration circuit which adjusts the clock period in every stage taking into account its particular static deviations.
I. INTRODUCTION
In every semiconductors technology upgrade a several improvements are achieved, being most of them a consequence of the shrinking transistor's feature size tendency. But as long as the transistor channel length reaches tens of nanometers regime approaching to the limit of optical lithography, efficient control over the multiple devices parameters is becoming more difficult, yielding to an increase on devices mismatch, carriers mobility impact, non homogeneous doping profile, threshold voltage variability, among others. These process variabilities deviates the resulting integrated circuit from its specifications degrading its performance [1] . For example, according to the last ITRS report [2] , the threshold voltage will reach a variability of 112% for 2022 which is critical, because MOS transistor electrical characteristics and behavior strongly depends on this parameter.
In addition, the power supply delivery network increases its complexity in order to provide energy to every device in highly complex chips, but the network's parasitics do not scale at the same ratio as supply voltage does and in conjunction with high fast transient currents, problems like L bounce, etc. increase the proportional impact on voltage level. For the incoming technologies these problems are expected to be aggravated [3] because higher parallelism of larger systems with more devices switching at same time, increase the transient current peaks magnitude and frequency and so the voltage fluctuations, making usual solutions like decoupling capacitor insufficient [4] .
Similar situation is present on IC temperature fluctuations: as long as more complex systems perform more operations per second, more heat is generated inside the chip in a nonhomogeneous way. According to ITRS last report the ICs power dissipation [2] will exhibit a rather steady tendency in forthcoming years saving the chip from destructive risks, but not from the temperature gradient effects on devices characteristics, for example, on threshold voltage, adding dynamic fluctuations superimposed on its static process variations. For these reasons, two different cells on the same system could have slight temperature difference with different fluctuations profile due to its own activity and the heat transmitted from other circuits on the chip, making them have different response at the same stimulus.
Process variations are commonly addressed adding extra steps to the manufacturing process to gain a finer control over it and/or to perform corrections over some devices or entire sections of the chips. Due to its static nature, corrections aimed to process variability are performed once, but for environmental factors, the implemented solution has to follow its dynamic nature making more difficult to sense and compensate them. These efforts are getting higher design time consuming, less effective and impose higher penalization on circuits' performance.The solution proposed on this article compensate all these factors, both dynamic and static, with a comprehensive strategy based on simple built-in circuits adjustable to process variations and performing a real time sensing/compensation of dynamic environmental fluctuations.
The paper is organized as follows: in section II-A the proposed mechanism for compensate environmental factors, voltage and temperature, is presented, in section II-B how this compensation method is extended taking into account process variation with calibration extra-circuitry is introduced. The advantages of compensating mechanisms are analyzed through a practical example, a carry ripple adder, in section III. Finally section IV presents the conclusions of this work.
II. COMPENSATION MECHANISMS
The proposed compensation method is divided in two mechanisms each of them take into account the nature of the compensated phenomena: the dynamic fluctuations for temperature and voltage and the static process variability effects on circuits. The first one focused in environmental fluctuations is described in subsection II-A and the part related to process variability in subsection II-B .
A. First mechanism: Local temperature and voltage fluctuations compensation
The proposed temperature and voltage compensating mechanism consists in the allocation of a chain made of an even number of inverters in the clock path between the input and output registers of a given circuit as shown in Fig. 1 . Both of them, the logic circuit and the inverter chain, will be merged in the same layout in order to be affected by the same temperature and voltage fluctuations defined as shown in Eq. 1 and 2 respectively
being V n (t) the actual voltage of the circuit, ∆V DD (t) and ∆V SS (t) the time dependent fluctuations on each power rail,V DD and V SS its nominal values, T m (t) the local temperature, T m the nominal temperature and ∆T n (t) its time dependent fluctuations. The length of the inverter chain is calculated to match the clock period under nominal environmental conditions (chosen according with the largest nominal data process time). Because inverter chain and logic circuit are attached to the same pair of power supply node and embedded in the same layout, every change on the environmental conditions affect them in similar way, deviating the data process time from the its nominal value in ∆t p (V n , T m ) and regenerating the clock signal with an additional skew component ∆t ck (V n , T m ), depending both of them in the characteristics of these changes. The inserted skew will be very similar to the deviation on data process time (Eq. 3), holding the time margin between the clock edge and the data arrival, as shown in Fig. 1 . The chain components were chosen inverters because it can be merged in the main circuit's layout without greatly affecting it and provides fine control over the total inserted skew.
An even better performance is achieved when this mechanism is implemented in multiple stage pipelined systems as shown in Fig. 2 ), the clock signal clk 2 at the output of the second inverter chain is the algebraic sum of its own generated skew and the generated in the previous stage. In the general form, the clock signal at the i−th stage is the addition of original clock signal clk and all the generated skews ∆t pi from the first stage to the current i − th stage as stated in Eq. 4. 
Because the clock regeneration follows a domino mechanism, it causes a compensation effect, due to the partial random nature of the fluctuations. In the conventional worst case design approach the clock period clk wc is chosen to be larger than the largest nominal data process time t pn taking into account the maximum possible deviations on it max∆t pn provoked by all the factors involved as shown in Eq. 5. But in this proposal, the clock for the i − th stage (clk i ) will be dynamically adapted to the instantaneous current conditions of the stage and the previous ones(Eq. 4).
The compensation is performed with no additional control signals or feedback paths which would made it slower, there are not latency, do not add many extra-steps on design process and no extraordinary effort must be dedicated to design the compensator. For larger systems with larger number of stages better result are expected, because the time savings in every one of them allows to reduce the total clock periods needed by the system to complete data processing. The drawbacks of the technique are the overhead and extra-power consumption introduced. The resulting circuit operates locally asynchronous while the whole system operates synchronously.
In [5] a more extensive description of this mechanism is presented and more analysis and advantages are described.
B. Second mechanism: Adjusting compensation for process variability
In the first compensation mechanism just environmental factors fluctuations were considered, but static process variability affecting the inverter chain and logic circuit may introduce deviation on nominal timing parameters for both circuits, provoking the time margin between regenerated clock edge and data arrival may be reduced or enlarged. Even worse scenario may arise when environmental conditions fluctuate interacting with static variation, for example, in the case of threshold voltage which dynamically fluctuate due to its timevarying temperature dependence, as shown in Eq. 6 where V th is the nominal threshold voltage and ∆V th (xy, T m ) the dynamic fluctuations from this nominal value.
V th (xy, T m ) = V th + ∆V th (xy, T m )
If process variations enlarge the time margin between data and clock, some of the inverters in the chain are unnecessarily generating extra power consumption and penalizing circuit speed; in the other hand, if time margin is too close or even exceeded by data arrival metastability or data lose may occurs. In order to avoid this situation, an adjusting mechanism is proposed which fits the number of inverters according with the effects of process variability over the circuit and inverter chain itself. The technique incorporates the scan path mechanism in order to have more controllability and observability over the input/output at every stage and enabling test and normal operation mode options. The scan path allows to load tests vectors corresponding to worst case path on each stage. When the circuit enters in the calibrating phase performs the operations shown in Fig. 3 (in the Fig. 4 the calibration block diagram is shown).
Once the final stage's calibration is accomplished, the system enters on normal operation mode with a fitted compensation circuit ready to effectively mimicking the effects of local temperature and voltage, and including the process variation too. The calibrating circuit is shown in Fig. 4 for just one stage. This process is performed once at the system start up but can be programmed with some periodicity according with the system characteristics.
Observe that after adjusting the number of inverter cells, the regenerated clock signal and data signal edges may be very close from each other and metastability may arise; to avoid it, the algorithm adds a pair of inverters to the chain even before error detection step is performed forcing the system to include 
III. A PRACTICAL EXAMPLE: CARRY RIPPLE ADDER
To illustrate the advantages of these compensation mechanisms, a carry ripple adder was chosen as testbench because its pipelined structure make easy to identify the contributions of every stage on data process time deviations. The inverter chain was embedded in the same adder basic cell(ABC) as shown in Fig. 5 . Doing it in that way presents two advantages: a) this new cell can be added to technology libraries and used in automated synthesis and b) process variations and environmental fluctuations can be followed very close due to this proximity. The ABC nominal largest data process time corresponds to carry output (C o in Fig. 5 )when the input switch between 101 ⇐⇒ 100. This output has been measured for a −10%V n ≤ ∆V n ≤ 10%V n , 25 o C ≤ T m ≤ 125 o C and inverters were added until Eq. 7 is fulfilled ∆t p−e ≤ m∆t not−e
where ∆t p−e stands for deviations on data process time and enlarge the layout area by a factor of almost 2.1 for a demifull-custom approach. With this compensated ABC a 16-bit Carry Ripple Adder was implemented. The inverter chain must be adjusted in order to fit the deviations provoked by process variation and avoid its effects on the time margin effectively hold by the inverter chain. In this particular case, the adjust mechanism need to check just the most significant bit correctness and add inverter pairs until the condition stated in Eq. 8 is fulfilled
where n is the number of bits, t p is the nominal ABC process time, ∆t p−p are the deviations due to process variations, t c is the time for the inverter chain embedded into the basic cell, ∆t c−p are the deviations on this time due to process variations, ∆t not is the not process time and finally, a is the number of needed cell to equal both side of the equation that may be positive or negative depending on the effects of process variations over the circuits. For this example, Monte-Carlo simulation has been performed and the process variations can be effectively compensated modifying the amount of inverters per ABC within the range of 18 ≤ 20 ≤ 24.
IV. CONCLUSIONS
In this paper we have introduced a strategy to compensate environmental and process variations, key factors limiting the performance of ICs built from modern CMOS technologies. The strategy is based on the implementation of a clock regenerating mechanism: a chain of inverters reproduces the clock signal for the receiving latches of a circuit stage. As inverters are affected by the same environmental fluctuations as the stage's processing logic, an efficient compensation is performed, allowing reductions on clock safety margins improving the circuit performance. The mechanism is calibrated to compensate process variations effects in the circuit.
