Abstract-In this work we propose a self-adaptive clock based on a ring oscillator as the solution for the increasing uncertainty in the critical path delay. This increase in uncertainty forces to add more safety margins to the clock period which produces a circuit performance downgrade. We evaluate three self-adaptive clock systems: free running ring oscillator, infinite impulse response filter controlled RO and TEAtime controlled ring oscillator. The safety margin reduction of the three alternatives is investigated under different clock distribution delay conditions, dynamic variation frequencies and the presence of mismatch between the ring oscillator and the critical paths and the delay sensors.
I. INTRODUCTION
Modern digital systems rely on synchronous circuit architectures. On any synchronous circuit the clock is the most critical signal and its period is a critical parameter that has to be carefully selected. The clock period has to be long enough to accommodate the critical path (CP) delay plus the set-up time and the clock-to-output delay of the registers. Since there is an uncertainty component in the delay of every logic gate due to the process, voltage, temperature and aging (PVTA) variations a safety margin has to be added to the clock period. This safety margin ensures a correct operation of the synchronous system. The more margin added, the more unlikely to fail the chip is. However, the introduction of safety margins (SM) represents a loss in performance. Alternatively, SM can be added to the supply voltage instead of to the clock period. In this case the yield is increased but at the price of more power consumption.
PVTA variations can be classified as static or dynamic and spatially homogeneous or heterogeneous. Table I classifies the most common variations following this taxonomy.
The margin added to the clock period or supply voltage has to be carefully determined. PVTA variations produce a delay uncertainty which is hard or, in some cases, impossible to predict. Different techniques like corner analysis, SSTA, etc; are used to estimate the safety margin that, once added to the clock period or the supply voltage, produces a desired yield.
As the transistors minimum size shrinks, the uncertainty due to process variations increases as well as the aging effects become more important [1] , [2] . In addition, the transistor size reduction allows the integration of more functions in the die. This complexity increase leads to an escalation in the number of possible CPs. More CPs impose a larger SM to satisfy a given yield [3] . 
Static Dynamic Homogeneous
• Die to die (D2D) process variations.
• Voltage regulation module (VRM) ripple.
• Room temperature variations.
• Off chip voltage drops.
Heterogeneous
• Within die (WID) process variations.
• Device to device random (RND) process variations.
• Simultaneous switching noise (SSN).
• IR drop.
• Temperature hotspots.
• Ageing.
Smaller transistors facilitate to integrate more functionalities in the same die, increasing the power demand variability. This uncertainty in the consumed power by the circuit blocks makes more difficult to estimate the supply voltage variation such as IR drop or simultaneous switching noise. Also, transistor miniaturization may allow the integration of voltage regulator modules on the die. This integration step is expected to induce more supply voltage ripple than from off-die regulators [4] .
Also, the temperature of the circuit can vary depending on the computation carried out since the amount of demanded current by its different blocks depends on the executed instructions. On top of this the temperature also depends on the temperature of the environment where the chip operates.
As the amount of uncertainty reaches its highest value and its estimation during the design stage consumes more and more resources, a new paradigm in the synchronous circuit design is needed. Some authors had proposed the adaptation of the clock period to PVTA variations [5] , [6] . We propose in this article a new approach in this field: the self-adaptation of the clock period to ensure the correct operation of any synchronous circuit under PVTA variations. The self-adaptive clock will vary its period to prevent the timing violations along the die.
In section II we show the natural capability of ring oscilla-tors (RO) to act as self-adaptive clock sources that can cope with PVTA variations. Also, in this section, its weaknesses are revised. In section III a closed loop control architecture for the ring oscillator is proposed in order to cope with the RO weaknesses. In section IV the simulation results are presented and the advantages and disadvantages of the closed loop controlled RO in front of a free running ring oscillator and an increment controlled RO discussed. And finally, in section V the conclusions are exposed.
II. RO ADAPTATION TO PVTA VARIATIONS
A ring oscillator (RO) is an oscillating circuit: a chain of inverting and non inverting stages, where the number of inverting stages is odd and the chain output is connected to the chain input.
ROs, due to its oscillating nature, can be used to generate clock signals but, in digital systems, they are not used to carry out this duty because its high sensitivity to PVTA variations. This high sensitivity is normally accounted as a source of clock period indetermination.
The RO sensitivity to PVTA variations can be the key point to build a clock signal source where its period is adapted to the circuit environment conditions. This change of perspective will lead us to stop relying on fixed clock signal sources like PLLs and reduce the safety margins added to the clock period and/or the supply voltage during the design stages. As a side effect, the resources and time spent estimating PVTA effects on the CP delay during the design stages will be also reduced.
Let us assume an ideal case in which the RO is constructed with similar gates present in the CPs candidates, suffers the same PVTA variations as all the candidates to operate as a CPs and the clock distribution is instantaneous (Fig.  1 ). Under these naive operating condition assumptions it is obvious that the RO generated clock will adapt its period to the instantaneous delay suffered by the gates in the die. In this way the SM can be reduced with respect to a fixed clock, which period is independent of process and operation conditions. Unfortunately these conditions do not take place in reality. 
A. RO limitations
The limitations of the RO clock generation are caused by the mismatch between the PVTA variations that take place in the RO gates and all the other gates circuit, specially the CP.
This mismatch can be caused by the heterogeneity of the variations along the die such as within die variations (WID) due to process, different IR drops on V dd , temperature differences in the chip, etc. Also the mismatch can arise if the variations are homogeneous but have a dynamic component. The clock generated by the RO need to be distributed all along the die through a clock distribution network (CDN). The CDN imposes a delay, t clk , between the generated clock and the delivered clock signals. The period of the delivered clock, at the end of the CDN, will be adapted to the variations that occur t clk before, not at that instant.
Against heterogeneous variations, static or dynamic, the RO can fail reducing the safety margin. The RO circuit adapts its period to the environment conditions around it. In fact, the RO acts like a point sensor.
The homogeneous dynamic variations (HoDV) can be partially addressed by the RO clock. The mismatch between the clock period and the CPs delay, considering only the presence of an HoDV ν(t), depend on the period of the dynamic variation, T ν , and the clock distribution delay, t clk , introduced by the CDN. The mismatch between the RO and a CP, Δν(t, t clk ), will be equal to:
1) Periodic HoDV: Considering HoDV as a periodic function, ν(t) = ν 0 sin(2πT ν −1 t + φ), the worst mismatch due to an HoDV will be equal to:
The boundary that limits the safety margin reduction for a periodic HoDV is t clk < T ν /6 or (n − 1/6)T ν < t clk < (n + 1/6)T ν for n ≥ 1, where n is a positive integer. Within these constraints the use of a RO reduces the value of the needed safety margin. If t clk does not fulfil these constraints the adaptive clock will need more safety margin than the fixed clock as can be seen in Fig. 2 .
2) Single event HoDV: For a single event, like a fast voltage drop along the whole die, assuming a triangular shape with a duration of T ν and an amplitude ν 0 , the mismatch due to the CDN delay will be equal to:
If t clk is larger than half of the event duration the safety margin needed by a normal clock system and the RO will be the same. Therefore under this assumption there is no reason to use the adaptive system as can be seen in Fig. 2 . In the harmonic case exist some zero mismatch islands along the CDN delay axis. This islands, which seat at multiples of the perturbation periods, are intercalated between zones where the RO introduces more mismatch than the perturbation itself. In the single event case when t clk < 1/2Tν the RO reduces the perturbation. But when t clk > 1/2Tν the RO do not increment the mismatch, just introduce the same as the HoDV perturbation alone.
Eqs. 2 and 3 clearly express the trade-off between the CDN delay and the maximum dynamic variation frequency that can be tolerated for a given maximum mismatch. This trade-off relates not only the the maximum frequency of the dynamic variation with CDN delay but also the clock domain size since it is directly related with CDN delay.
As a short conclusion we can say that the free running RO can be useful to reduce the safety margins introduced to cope with HoSV and HoDV with variations much slower than t clk . In section III we propose a more complex architecture in order to cope with heterogeneous variations, which the RO can not fight, as well as improve the adaptation to dynamic variations.
III. CLOSED LOOP CONTROLLED RO
To cope with the spatial heterogeneity of the PVTA variations we propose to disseminate sensors all over the clock domain. As sensors we propose the time digital converter (TDC) [7] . TDC outputs, every clock cycle, the number of crossed gates, or stages, by an alternating signal during the last period. This integer number give us a sense of the delay suffered by the gates near each TDC. If the output of the TDC is low it means that the logic gates are experimenting an extra delay due to the variations; or vice-versa when the output is high. The stages of the TDCs and the RO are supposed to be equal. This equivalence will not take place in reality, for this reason we will have to take into account heterogeneous static and dynamic variations.
Once we have some sensors on the core we can compare, at each period, the worst sensor output τ , this is the lowest output among all the TDCs, with a given set-point c and then take some actions over the RO, i.e. changing its length l RO , in order to adapt the clock period to the variations that the RO can not sense. Changing the RO length, i.e. the number of stages, will change the RO period. This closed loop controlled RO architecture is depicted in Fig. 3 . τ is related to the logic depth a signal can traverse in one period given PVTA conditions in the area around the TDC. When τ < c means that the period is too short and a logic error may occur. When τ > c no error occurs, but there is a potential loss in performance since the clock period is too large for the given PVTA condition.
By having a clock managed by a closed control loop with a set-point input, the clock period needs not to be set during the design stage, just the minimum and maximum number of RO stages. This leads to a more relaxed CPs delays estimation. Once the chip is produced and it is running, we only need to choose the correct set-point c that allows the system to run without any error and/or maximizes the computation throughput. Therefore the pipeline needs, at least, error detection capacities.
In this article, the delay and the period are expressed in number of stages. In fact the units of c, l RO and τ i are number of stages. Set-point c is the desired output from the TDCs τ i and l RO is the length of the RO.
Since every event is triggered by the clock edges the architectural view in Fig. 3 can be translated into a discrete control system view as shown in Fig. 4 . As a first approximation to the problem we modelled the action of the RO, CDN and TDC as a simple delay chain with the addition of perturbations, that account for the heterogeneous and homogeneous variations.
When the RO length l RO is changed the clock period changes the value of its period in the next clock period. Then this clock has to be distributed through the CDN and will take M periods to arrive to the registers. The value of M will depend on the period of the clock signal T clk [n] at each step and the delay of the CDN t clk :
. Once the clock arrives to the registers the TDCs outputs the number of crossed stages τ during the last period. τ is compared with the set-point c in order to generate a error value δ = c − τ . δ is injected into the control filter H(z) that will, after one period, give the new l RO .
The period of the clock generated by the RO can be influenced by an homogeneous variation e, which affects equally the TDCs. TDCs can be also affect by an heterogeneous variation μ. When the same variation affects the RO and the TDC the output of the TDC does not vary. Therefore, in the discrete control system schema (Fig. 4) , the perturbations in the TDC and in the RO present opposite sign. The main signals are labelled: the set-point c which is the only input of the clock generation system, the ring oscillator length l RO , the generated clock clk, the distributed clock clk and the i-th TDC lecture τ i . Also we point out that the clock generator, the RO, and the sensors, the TDCs, could suffer different variations. 
A. Control block constraints
Once the control loop is defined it is possible to find out how the two most important magnitudes, δ and l RO , behave when some change occur in the perturbations, i.e. e and/or μ, or set-point, c.
As depicted in Fig. 4 we can derive, in the z-domain, the l RO and δ expressions as function of the combined inputs p(z), assuming that H(z) = N (z)/D(z):
where
If we assume that p(z) is a Heaviside step, when t → ∞, the desired value for δ and l RO respectively are:
this is that under a minimum perturbation the value of l RO changes to counteract it (6) and, consequently, the error value δ tends to zero (7) . Using the final value theorem the constraints for N (z) and D(z) are found:
B. Control block implementations
In this section we propose two different implementations of the control filter H(z). The first is an infinite impulse response (IIR) filter and, the second, a TEAtime [8] , [9] implementation.
The proposed IIR control block architecture is depicted in Fig. 5 . It is slightly different from a standard IIR filter due to the constraints found in sec. III-A and some implementation constraints, like the aim of reducing the clock generation circuit area overhead. Due to this we choose to operate over the integers avoiding the use of floating point operations. Secondly the gain values all along the IIR control block are constrained to powers of two in order to simplify multiplication operations. Since we choose to operate over the integers we need to minimize the rounding error inside the filter. To do so, we scale the signal (k exp and k exp −1 ). Another difference with common IIR filters is the k * gain (Fig. 5) added to ensure the fulfilment of constraints in (8) when the IIR has more than one coefficient. Also we added an extra delay after k * gain in order to take into account the possible necessity to pipeline the control block initial adder.
. . . The transfer function of the proposed IIR filter (Fig. 5) is the following:
To fulfill the constraints in (8) the filter coefficients have to follow:
For the TEAtime implementation, the control block H T EA (z), is depicted in Fig. 6 . In this case there are no parameters to set and therefore the constraints do no apply in this case.
y Fig. 6 . TEAtime control block implementation inspired from [8] , [9] .
IV. ARCHITECTURE SIMULATION
To perform the simulations we used the Simulink R software from Mathworks R . We simulated the adaptive response of three different systems: the proposed IIR controlled RO, a TEAtime controlled RO and a free running RO.
The chosen gain parameters for the IIR controlled RO are:
With these values we achieve a balance between filter adaptation velocity and low output ripple. k exp value is chosen to ensure that the minimum perturbation propagates through almost all the branches of the filter. The set-point value for all the simulations is c = 64, this is the desired TDCs reading. The amplitude of the periodic perturbation e is set equal to 0.2c. the period of the perturbation is set to 25c. In this case the safety margin is slightly reduced with respect to a fixed clock. Middle plot: the perturbation period is set to 37.5c. An appreciable adaptation error reduction takes place once the perturbation frequency is decreased. Lower plot: the perturbation period is set to 50c. The Adaptation error is reduced to a minimum value.
A. Homogeneous dynamic variation (HoDV)
In Fig. 7 the timing error τ − c due to a HoDV of different frequencies is shown for a CDN delay equal to c stages. In the top plot the period of the perturbation is equal to 25c stages, that is 25 times the nominal period value. In this plot is possible to see that the negative timing error which is equal, in absolute value, to the needed safety margin, is quite close to the margin that would need a fixed clock to ensure an error free operation, nevertheless the τ − c amplitude is reduced.
In the middle plot of Fig. 7 the period of the perturbation is augmented to 37.5c stages. As the period of the perturbation is augmented the system can adapt the clock frequency better and, consequently, reduce the impact of the HoDV.
And finally, in the Fig. 7 lower plot, the period of the perturbation is increased to 50c stages. Once the perturbation is low enough its impact on the timing error τ −c gets even more reduced for the three adaptive clock systems here considered.
In Fig. 7 is possible to see that the different adaptive clock generation systems evaluated can reduce the timing error due to a HoDV but it is hard to say which one achieves the greatest reduction, that is the best adaptation. In order to evaluate adaptive clock systems in which the period changes with the PVTA, we need to compare the mean clock period when no errors are detected. For this reason, we use as figure of merit the relation between the mean clock period of the adaptive clock to the fixed clock period, this is the relative adaptive period T clk /T clk f ixed .
In Fig. 8 T clk /T clk f ixed is shown for different scenarios under a HoDV. In Fig. 8 upper plot the period of the perturbation is kept fixed, T e = 100c, and the CDN delay is changed. Up to t clk /c = 6 the IIR RO is slightly the best option. In Fig.  8 lower plot the CDN delay is kept constant and the period of the perturbation is varied. Here the free running RO seems to be the best option under almost any situation, the IIR RO is slightly better only in an interval around T e = 100c.
To conclude with an example, the HoDV adaptation results can be translated in terms of period measured in seconds. Let us assume that the set-point c = 64 generates, in ideal conditions, a clock period T clk = 1ns. Under a CP delay variation up to 20% the clock period has to be set to T clk = 1.2ns, or in the number of stages nomenclature the set-point should be changed to c = 77. Also assume that the adaptive clock allows to reduce the needed c, which assures an error free operation, up to 10%. This can be translated as a reduction of 0.12ns in the clock period, which is a 60% reduction of the added SM.
B. Heterogeneous dynamic variation (HeDV)
In Fig. 8 it was shown that under different conditions the free running RO is the best or almost the best option to cope with HoDV. But as we consider the HeDV, introduced through a mismatch offset μ as indicated in Fig. 4 , the best adaptive clock generation system option will not be the free RO any more. In Fig. 9 the relative adaptive period, for different period of the perturbation and CDN delay scenarios, is shown when there is a mismatch, up to 20%, μ between the RO and the TDC which are also under a HoDV. The SM added to the free RO operation is set to a constant value which allows an error free on the whole μ/c range since the length of the free RO length need to be set during the design stage. Fig. 9 shows that IIR RO is the best option except for fast perturbations (upper row of Fig. 9 ) where the TEAtime is the best option on almost all the μ/c range. The IIR controlled RO should be chosen to face mid-low frequency perturbations or when the clock domain is small enough to maintain a low CDN delay.
To conclude the HeDV adaptation results if we assume that the set-point c = 64 generates, in ideal conditions, a clock period T clk = 1ns. Under a delay variation, due to HoDV, up to 20% and a delay variation, due to HeDV, also up to 20%; the clock period has to be set to T clk = 1.4ns, or in the number of stages nomenclature the set-point should be changed to c = 90. If CDN delay and perturbation period scenario lead the adaptive clock to reduce the needed c, which ensures an error free operation, up to 20%, this reduction can be translated as a reduction of 0.28ns in the clock period, which is a 70% reduction of the added safety margin.
V. CONCLUSIONS
In this paper we studied theoretically the effects of the clock distribution delay in the presence of homogeneous variations, static and dynamic. We showed that, in the presence of homogeneous dynamic variations, the clock distribution delay induces a heterogeneous variation between the clock generation circuit and the CPs on the clock domain. This induced mismatch supposes a limitation to the adaptive clock systems in terms of clock domain size.
We also argued that the heterogeneous variations may not be corrected by a concentrated adaptive clock generation system like a free running RO. To cope with heterogeneous variations we proposed a closed loop architecture with delay sensors, TDCs, disseminated along the clock domain.
We propose a new clocking architecture where the clock value is not set during the design and/or test stages, instead of this we propose a system that tries to minimize the difference between the sensors output and the given set-point value. The set-point value could be varied as function of the timing errors during a time window and/or the performance necessities.
We modelled, at a very high level, the action of dynamic perturbations on the ring oscillator and the TDC sensors as well as the effect of a variation mismatch between them.
Since the proposed system acts like a closed loop we find some constraints for the control block when it is an IIR filter.
Finally we ran a functional simulation showing that the free running ring oscillator can not be used alone as a source of adaptive clock since its generated clock is only adapted to the variations suffered by the very near environment of the ring oscillator circuit. And that the IIR-controlled ring oscillator generates the most adapted clock signal under heterogeneous variations which are likely to appear in modern ICs.
