Adaptive clock with useful jitter by Cortadella, Jordi et al.
Adaptive Clock with Useful Jitter
Jordi Cortadella∗, Luciano Lavagno†, Pedro Lo´pez∗, Marc Lupon∗, Alberto Moreno∗, Antoni Roca∗ and Sachin S. Sapatnekar‡
∗Department of Computer Science, Universitat Polite`cnica de Catalunya, 08034 Barcelona, Spain
†Department of Electronics, Politecnico di Torino, 10129 Torino, Italy
‡Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA
Abstract—The growing variability in nanoelectronic devices due to
uncertainties from the manufacturing process and environmental con-
ditions (power supply, temperature, aging) requires increasing design
guardbands, forcing circuits to work with conservative clock frequencies.
Various schemes for clock generation based on ring oscillators have
been proposed with the goal to mitigate the power and performance
losses attributable to variability. However, there has been no systematic
analysis to quantify the benefits of such schemes. This paper presents
and analyzes an Adaptive Clocking scheme with Useful Jitter (ACUJ)
that uses variability as an opportunity to reduce power by adapting
the clock frequency to the varying environmental conditions and, thus,
reducing guardband margins significantly. Power can be reduced between
20% and 40% at iso-performance and performance can be boosted
by similar amounts at iso-power. Additionally, energy savings can be
translated to substantial advantages in terms of reliability and thermal
management. More importantly, the technology can be adopted with
minimal modifications to conventional EDA flows.
I. INTRODUCTION
It is widely recognized that the ultimate limit to Moore’s law is not
technology but economics. Every generation requires an enormous
increase in the non-recurring engineering (NRE) and fabrication
costs. As a result, the cost per transistor, which had been decreasing at
every technology node for several decades, may have been increasing
over the last few nodes [1].
Even in the past decade, purely geometrical scaling has been
limited by physical challenges and lithography issues. As the supply
voltage has stagnated, enhanced performance has been enabled by
the notion of equivalent-scaling [2] in the International Technology
Roadmap for Semiconductors, whereby clever “tricks” have been
used to achieve better performance at the next node.
Further, the benefits of scaling are showing diminishing returns.
While moving to a new technology node implies a 3–4× increase in
manufacturing cost, key metrics such as speed, power, and density
are only confined to a 20–20–20% improvement, respectively [3].
The modest progress in performance metrics is largely determined
by variability: the increasing gap between worst-case and nominal
delays that must be covered by conservative guardband margins.
Today’s methodologies actively fight off variability. Leading-edge
designs use low-variability phase-locked loops (PLLs) for low-jitter
clocks and near-zero-skew trees for clock distribution. They employ
a strict discipline to maintain the rigidity of timing boundaries, with
conservative guardbands used for each combinational block to ensure
that these rigid boundaries are not violated. As the magnitude of on-
chip variability increases, such guardbands incur prohibitive power
and delay overheads.
This paper presents a novel design-based equivalent scaling “trick”
that embraces dynamic variability instead of fighting it off. The
method, based on a new paradigm called useful jitter, leverages
common-mode variability between the circuit and the clock source
and develops an alternative clocking scheme that is an innovative
alternative to PLL-based approaches. This clocking scheme, coupled
Fig. 1. Clock generation with PLL and ACUJ.
with a chip-wide design methodology, mitigates the margins required
to tolerate dynamic variability, thus reducing the overall chip power.
Unlike the classical goal of attenuating jitter to provide more robust
clock generators, the proposed approach intentionally generates jitter
to accommodate dynamic delay variations. Fig. 1 illustrates the
effect of using useful jitter to reduce margins. The waveforms have
been obtained with SPICE simulations and show the clock signal
arriving at the flip-flops with a power supply fluctuating with ±30%
noise. Although this is larger then the voltage droops seen in typical
systems, it helps to emphasize the benefits of this technology and
could potentially model scenarios in an energy harvesting context.
The PLL (middle) maintains a conservative fixed frequency (810
MHz) to cover the delay variability of the circuit. However, the
adaptive clock dynamically adapts the frequency to the variability.
By preserving the same nominal voltage (1.2V), it can achieve an
average frequency of 1.55 GHz, ranging between 807 MHz and 2.25
GHz. This speed-up can be converted into power savings by scaling
voltage. In the bottom waveform we can observe the clock signal
working at 0.85V and maintaining an average frequency (814 MHz)
similar to one of the PLL working at 1.2V. It is interesting to point
out the low frequency of the clock (291 MHz) when power supply
approaches Vt (0.28V for NMOS devices). This paper demonstrates
how this intentional jitter generates a robust clock without timing
violations and provides significant power and performance benefits.
The proposed technology is based on the exploitation of the
sensitivity of ring oscillators to PVT variability. Some schemes have
been tentatively explored in the past [4]–[6] but no approach with
quantifiable benefits has been analyzed. This paper proposes a robust
methodology that can be validated by conventional timing sign-off
procedures and analyzes its power and performance benefits.
The main features of the Adaptive Clock with Useful Jitter (ACUJ)
TABLE I
A TAXONOMY OF THE SOURCES OF VARIABILITY.
Static Slow (ms) Fast (ns)
Global PV VTA V
Local PV VTA V
Worst corner (1.83)
OCVHVT Typical corner (1.00)
Global variability Clock uncertaintyLocal variability
Worst corner (1.76)
LVT OCVTypical corner (1.00)
2.12
2.20
P V T
V TP
Fig. 2. Delay distribution for timing sign-off.
can be summarized as follows:
• Efficient: 1.2–1.4× speed-up or 30%–40% power savings with
regard to worst-case sign-off.
• Not invasive: the original circuit is not modified. ACUJ is an
alternative to PLLs–both clocks can live together without any
need to modify the clock tree.
• Practical: adoptable in commercial design flows with conven-
tional sign-off procedures.
• Reliable: bringing substantial improvements in thermal manage-
ment and reliability as a byproduct of power reduction.
II. VARIABILITY, MARGINS AND STATIC TIMING ANALYSIS
We next review the main sources of variability and how margins
are taken into account during Static Timing Analysis (STA).
Among the different taxonomies for classifying the sources of
variability, we select one that helps to easily identify the margins
used for timing sign-off. Table I classifies variability according to
two parameters: locality and variation speed.
In terms of locality, global variability affects all devices uniformly
whereas local variability has a different impact for each device. Some
elements of local variability (e.g., voltage or temperature) may exhibit
spatial correlation, i.e., their impact may be similar for devices located
in the same region. In terms of variation speed, we can distinguish
between static and dynamic variability. Process (P) variability is
always static and can be either global (systematic) or local (random).
Temperature (T) and aging (A) have slow variability, e.g. of the order
of milliseconds or even slower. Both sources have global and local
variability components.
Aging will not be discussed in this paper and the analysis will be
left for future work. However it is important to point out that ACUJ
also brings additional benefits by naturally compensating the aging
of the circuit with the aging of the adaptive clock.
Voltage (V) has a diversity of variability components and deserves
a special discussion. On the one hand, it has DC components
produced by static IR drops that can be either global (off-chip
resistance) or local (on-chip power delivery network). On the other
hand, voltage variability also has AC components determined by the
activity of the system. The largest components of voltage noise have
mid and low frequencies and are global [7]. They are produced by
the internal activity of the chip that generates high current demands
to the voltage regulator and voltage drops across the RLC networks
of the board, package and on-chip power delivery.
Timing sign-off must take into account all the possible sources of
variability and add margins to cover them. Nowadays, the mecha-
nisms to model variability during STA are the following: (1) Library
corners to model global variability, (2) On-Chip Variability (OCV)
CLK
gen
FF FF
launching path (L)
capturing path (C)
L C
Fig. 3. Launching and capturing paths for timing sign-off.
derating factors to model local variability and (3) Clock uncertainty
to model jitter and any other safety margin included to account for
any uncovered variability or inaccuracy in the analysis.
The two bars shown in Fig. 2 depict a typical delay distribution for
timing sign-off. Data has been obtained by simulation of a critical
path using SPICE models from a 65nm commercial library. The
delays for two cell libraries are reported: Low-Vt (LVT) and High-
Vt (HVT). For each library, the delays have been normalized to its
typical corner (1.00ns for TT, 1.2V, 25oC; the delays for HVT are
about 2× those for LVT) and the contributions of the process (P),
voltage (V) and temperature (T) components have been estimated.
Global variability accounts for most of the guardband margins,
with P and V being the dominating components. The worst corner
covers the worst conditions for PVT global variability (e.g., SS
devices, 1.08V, 125oC). Local variability (OCV) also requires a
margin modeled as a derating factor that typically ranges between
5% and 15% in conventional sign-off (15% has been chosen in
the example). Finally, some fixed margin is usually added for clock
uncertainty (0.10ns in the example). Overall, timing sign-off is done
at more than 2x the delay of the typical corner.
We next review the base timing analysis to check a setup constraint
for a critical path that goes from flip-flop L to flip-flop C (see Fig. 3).
Two competing paths are involved: launching (L) and capturing
(C). For the circuit to operate correctly, the cycle period (P ) must
be sufficiently long to meet the setup constraint for all pairs of
launching/capturing paths (denoted by set LC):
P − J > max
i∈LC
(Li − Ci) (1)
where J is the maximum clock jitter. Timing sign-off must guarantee
that the clock frequency does not violate any timing constraint under
any operating condition. In the presence of variability, margins have
to be added to prevent timing failures, as follows.
During STA, global variability is modeled by library corners. Let
us assume that we have a set of corners that cover different PVT
configurations for devices (fast/typical/slow process, high/typical/low
temperature, high/nominal/low voltage) and interconnect (RCmax,
RCmin, . . . ). Let us call C the set of corners. For every timing path
p, we denote by pc the delay of the path at corner c.
Constraint (1) can now be quantified for all corners and all pairs
of paths to derive the minimum cycle period:
P − J > max
c∈C,i∈LC
(Lci − Cci ) (2)
Any clock period P satisfying (2) guarantees a correct behavior
for all PVT corners considered for STA.
Local OCV is modeled by applying derating factors to the launch-
ing and capturing paths. These factors can be different for each corner.
Let us denote δL and δC the derating factors applied to launching and
Clock domain
PLL
En
R
in
g 
os
ci
lla
to
r
CDC
CDC
Fig. 4. Incorporating ACUJ in a circuit.
capturing paths, respectively. Typically, δL ≥ 1 and δC ≤ 1. When
incorporating local variability, the setup constraint (2) is as follows1:
P − J > max
c∈C,i∈LC
(δLL
c
i − δCCci ) (3)
Next, we describe how this constraint is adapted to the clocking
scheme proposed in this paper.
III. ADAPTIVE CLOCK WITH USEFUL JITTER
Fig. 4 depicts ACUJ, the non-invasive clocking scheme proposed
in this paper. It is a clock generator with useful jitter that continously
adapts to the global variations suffered by the circuit. The clocking
scheme includes a clock source implemented as a ring oscillator
comprising a delay chain of logic gates that can be physically located
at any place within the clock domain and suffer the same global
variability as the circuit, as it was shown in Fig. 1. The only required
margins must cover the local variations between the ring oscillator
and the critical paths in the circuit.
ACUJ can be applied to any clock domain. The frequency of the
clock will be continuously fluctuating around a nominal frequency.
The frequency at any time instant will be determined by the actual
operating conditions at that instant. Each clock domain will have
to be isolated with clock-domain crossing (CDC) structures (e.g.,
asynchronous FIFOs) to prevent synchronization errors with the
neighboring domains. This is also a requirement for those schemes
that adjust the PLL frequency as a reaction to voltage droop detectors,
or for Dynamic Voltage and Frequency Scaling (DVFS) schemes.
A. Timing Model for Adaptive Clock with Useful Jitter
The previous analysis assumes that the period P is obtained from
a PLL with a fixed frequency. PLLs are attractive because they can
sustain the same frequency even in the presence of variability, and
hence they cannot adapt to it. Ring oscillators are often shunned
because they supposedly have a large jitter. However, even at the
core of a PLL, there is a Voltage Controlled Oscillator (VCO)
built with logic gates (e.g. current-starved inverters, to control its
frequency). These gates will suffer from the same variation sources
as the ring oscillator that we discuss here, but the resulting jitter
will be minimized rather than exploited. Moreover, they will have
similar sources of noise, also resulting in unwanted jitter, as the ring
oscillator that we use.
Let us now assume that the clock generator is designed as an
oscillator using the same type of components as the ones used
for the combinational logic and clock trees (e.g., logic gates and
buffers). Let us also assume, for the sake of simplicity, that the delays
of all components in the circuit scale uniformly with voltage and
1For simplicity, we assume the same derating factors for all corners.
L C
Clock tree latency
Clock tree latency
Critical Path
Period (Ring Osc.)
Fig. 5. Symbolic timing model for Adaptive Clock with Useful Jitter.
temperature2. In this case, the period of the clock would naturally
adapt to the process corner and operating conditions of the circuit.
Fig. 5 depicts a symbolic representation of the components affected
by variability when using ACUJ. The horizontal dimension represents
time. The top and bottom paths represent the launching and captur-
ing paths, respectively. The launching path includes the clock tree
(shaded) and the critical path delay (white) from flip-flop L to flip-
flop C (flip-flops are assumed to have zero delay in this model). The
capturing path includes the delay of the ring oscillator (white) and
the clock tree (shaded). The paths in the model are equivalent to the
ones shown in Fig. 3, explicitly substituting the clock generator by
a ring oscillator.
The bullets in the diagram represent signal pulses flying in the
launching and capturing paths. In general, the clock tree may contain
several flying pulses. Let us assume that the top and bottom bullets
are perfectly aligned under the absence of variability and that all
components have the same delay d.
Let us call P the time separation (period) between consecutive
bullets. With this assumption, an infinite stream of pairs of bullets
will arrive synchronized at flip-flop C every P time units.
Let us now assume that voltage drops to a point in which all
components are slowed down by a factor s, i.e. every component has
delay s · d. Then, the time separation between bullets (period) will
be increased to s · P but the bullets would still be perfectly aligned
in time, i.e. all the bullets will run in slow motion but at the same
speed. It is important to notice that the alignment/misalignment of
the bullets is independent from the clock tree latency in this model.
If instead a PLL would be used, the period produced by the PLL
would not be scaled with voltage and the bullets in the capturing path
would be misaligned with regard to the ones flying in the launching
path, thus producing setup violations. The only way to avoid the
timing violations would be to add margins to the capturing path
(reduce the frequency of the PLL) in such a way that the bullets
in the capturing path always arrive later.
Hence, the previous model shows how margins for global variabil-
ity can be eliminated when using ring oscillators and only margins
for local variability are required. Let us now study how this effect
can be formally modeled in terms of STA constraints.
B. Static Timing Analysis of Adaptive Clock with Useful Jitter
By using a ring oscillator, a different cycle period P c is generated
at every corner c. The setup constraint can now hold at every corner
c with a different period P c, i.e.,
∀ c ∈ C, i ∈ LC : δLLci < δC(P c + Cci ) (4)
where the term J (jitter) has been removed. The reason is because
the fluctuations of the oscillator are accounted as local variability by
2These assumptions are only made to simplify the conceptual discussion
about adaptive clocks and global variability. Any deviation from this assump-
tion, including those due to gate versus wire delays, must be included into
the derating factors used in constraints (3) and (4).
PVT Corners (c)
delay
(L,C) pairs: Ring oscillator (P c)
δLL
c
δC
− Cc
PLL period (P)
Fig. 6. Ring oscillator delay covering PVT variations of critical paths.
applying the derating factor δC to P c. The previous inequality can
be rewritten as follows:
∀ c : P c > max
i∈LC
(
δLL
c
i
δC
− Cci
)
(5)
A fundamental difference with (3) is that a different clock period
P c is obtained at every corner. This means that the causes of delay
change in the clock generator (ring oscillator) and the circuit are
exploited, as long as they are correlated, instead of being minimized
in the clock generator (PLL VCO) and taken as margin in the
circuit. This allows us to reduce margins substantially as it will be
corroborated by the experiments.
C. Implementing a Reliable Adaptive Clock with Useful Jitter
If we compare the top and bottom paths of Fig. 5, we observe the
need to match the delay of the ring oscillator with that of the critical
path. However, the critical path of the picture is just an abstraction of
the multiple critical paths that may determine the clock period under
different operating conditions.
In our synthesis framework, we have a path synthesizer that
generates a delay that closely matches the delay of the circuit under
different operating conditions. The main features of the synthesizer
(the details are out of the scope of this paper) are:
• All the PVT corners available for STA are used to calculate
the clock period at each corner. OCV derating factors and
uncertainty margins are applied to obtain the min value for P c
at each corner according to constraint (5).
• The path synthesizer generates a delay that meets all the
constraints for P c. Standard state-of-the-art timing models
(e.g. NLDM) and algorithmic techniques are used to solve an
optimization combinatorial problem.
• After the synthesis of the ring oscillator, standard STA is used
to sign-off at every corner according to (5).
Fig. 6 illustrates the delay constraints for the path synthesizer. The
discrete set of points on the horizontal axis represent PVT corners
(in increasing order of delay). The lines below the shaded region
represent critical path constraints. Each line connects the points of
an (Li, Ci) pair at different corners according to constraint (5). Each
pair may have a different sensitivity to PVT variations.
IV. REDUCING MARGINS: PREVIOUS WORK
A. Reducing margins for static variability: parametric binning
Parametric binning [8], either for speed or voltage, is a common
technique for reducing margins when worst-case sign-off cannot sat-
isfy certain parameters. In many cases, binning is used to classify dies
and assign different prices according to their performance metrics.
Table II summarizes the margins that must be applied to three
different scenarios for timing closure: Worst-case sign-off, Parametric
Binning and ACUJ. In case of worst-case sign-off, margins must
TABLE II
MARGINS REQUIRED TO COVER VARIABILITY.
Worst-case sign-off Parametric Binning Adaptive Clocks
Static Dyn Static Dyn Static Dyn
Global
√ √ √
Local
√ √ √ √
be used for any kind of variability (global/local, static/dynamic).
When doing parametric binning, the margins for static variability
can be reduced, since at-speed testing can determine a safe operating
frequency. Still, margins for dynamic variability are required because
a PLL cannot adapt to the fluctuating operating conditions of the
circuit. As we discuss in Section VI, this can improve performance
or power by 1.6× with respect to worst-case sign-off and by 1.2×
even with respect to speed binning under realistic hypotheses.
ACUJ offers additional advantages with regard to binning:
• Margins can also be reduced for global dynamic variability.
• No at-speed testing is required.
• Margins are certified by STA tools and do not depend on the
exhaustiveness of at-speed testing vectors.
With ACUJ, every die runs at its natural speed, which is deter-
mined by its process characteristics and instantaneously adapts to the
dynamic operating conditions. There is no need to do binning for a
die to run at its natural speed.
Interestingly, ACUJ also improves the speed of dies that fall
into the slow/slow process corner. The reason is simple: worst-case
corners used in STA assume worst-case VT operating conditions (e.g.
125◦C, -10%V), however circuits usually work at more favorable
conditions. ACUJ can work at the average frequency determined by
the nominal voltage, even though it can tolerate large voltage droops.
A similar reasoning could be done for the temperature ranges of
the circuit. Circuits driven by PLLs must work at a conservative
frequency determined by the worst-case conditions.
B. Reducing margins for dynamic variability
Various techniques have been proposed to mitigate the impact
of dynamic variability. One of the most aggressive is Razor [9]
and some variants based on a similar concept (e.g., [10]). They
reduce the clock period at the expense of tolerating timing errors.
The main drawback of Razor-like techniques is the significant area
overhead for error detection and correction, which involves intricate
schemes to cope with metastability and architectural support for
flushing the pipeline and replaying instructions. Along the same
lines, Tribeca [11] proposes to use ECC-protected data and local
recovery mechanisms to reduce guardband margins and work at
nominal conditions.
All the previous techniques can only be applied in advanced
microprocessors that incorporate schemes for error detection and
recovery. The benefits oscillate around 30-50% power reduction,
similar to those of the approach presented in this paper.
The most important dynamic variations are produced by voltage
supply droops. Recently, various approaches have been proposed
based on techniques for droop detection and adaptive clocking [12]–
[15]. Based on the fact that voltage droops last several cycles, droop
detectors can be used to anticipate the arrival of the cycles with the
largest droop amplitude. All the previously cited techniques propose
digital schemes for droop detection based on perceiving differences
or timing violations in delay lines or critical path monitors. After
detection, different reaction schemes are proposed. One possible
reaction is to quickly modify the clock frequency generated by a
Saved
margins
DLL
reaction
time
Droop detection threshold
Nominal voltage
Fig. 7. Saved margins for droop detection mechanisms.
Fig. 8. Margins for sign-off in different scenarios.
DLL [12], [14], [15]. Another possibility is to stop the clock during
the droop until the voltage is recovered to a stable level [13].
The main limitation of the mechanisms based on droop detection
is the reaction latency to modify the clock frequency. Fig. 7 shows a
droop that is detected when voltage goes below a certain threshold.
Above that threshold, voltage noise must be guardbanded with
margins. Once the droop has been detected, some control logic must
be activated to modify the clock frequency (DLL reaction time).
During that time, voltage continues falling down and margins are also
needed to compensate the increasing delays. After that, the margins
required to tolerate the maximum droop amplitude can be saved.
Moreover, these schemes do not exploit the fact that short-term
voltage variations typically have zero average value over relatively
short time intervals (e.g. a few µs), because they are due to second-
order inductive effects of the power distribution network. On the other
hand, as discussed in Section III-A, the performance of a circuit
driven by ACUJ can be guaranteed over that time interval. This is
essential to ensure functionality of circuits that must satisfy hard
external performance constraints.
V. BENEFITS OF ADAPTIVE CLOCKS WITH USEFUL JITTER
The reduction of margins offered by ACUJ results in substantial
power and performance benefits that depend on the technology and
the application domain. We next give a qualitative estimation of these
benefits based on the conventional methodologies used for timing
sign-off (the following section provides more quantitative analysis).
Fig. 8 (left) shows the margins used in STA for a corner-based sign-
off. The horizontal axis represents the [µ . . . µ+3σ] range of process
variability for a particular distribution of dies3 and the vertical axis
represents the cycle period.
The bullets represent the delay obtained at the typical corner (TC)
and worst-case corner (WC). Local (on-chip) variability is added as
a derating factor applied to the delay determined by the corner.
Finally, a constant margin is added for clock uncertainty (jitter,
inaccuracies in the timing models, etc.). The clock period for Worst-
Case Sign-off is given by the addition of all the previous margins on
the delays determined by the WC corner.
Fig. 8 (right) depicts the margins required for ACUJ and the
performance difference with regard to “Speed Binning” and “WC
3For simplicity, we focus on the positive segment of the distribution and
disregard the negative interval approaching the best corner.
Sign-off”. The benefits come from the elimination of the margins for
dynamic global variability.
A few points have a special interest for analysis. Point W represents
the cycle period for WC Sign-off. Point A represents the cycle period
for a die with typical process variation using ACUJ. This point also
assumes the systems to be working at a nominal average voltage and
temperature. The difference between A and W represents the benefits
for a typical die when no speed binning is applied.
When comparing ACUJ with WC sign-off, the benefits depend
on the process characteristics of each die. For WC sign-off, all dies
are specified to run at a unique clock period that is calculated to
guarantee a certain yield. ACUJ allows each die to run at its natural
speed, which is mostly limited by the process characteristics of the
manufactured devices. With regard to the environmental parameters
(voltage and temperature) the performance is determined by their
average value instead of their worst value.
Point C represents the cycle period achievable by a worst-case-
process die using ACUJ. The difference between W and C is
determined by the global VT variability.
Speed binning can reduce margins for process variability according
to the process attributes of each die. An ideal binning procedure
would determine the clock frequency by only the margins required
for dynamic and local variability. Point B represents the achievable
performance for a typical die. Again, the difference between A and
B is determined by the global VT variability. It can also be observed
that no benefits are obtained between speed binning and WC Sign-off
for worst-case dies.
The region between lines B-W (Speed Binning) and A-C (ACUJ)
represents the benefits of adapting to global VT variability.
VI. EXPERIMENTS
The benefits of the scheme presented in this paper have been
evaluated in two different scenarios: electrical simulations and an
FPGA prototype. The results are consistent and demonstrate signifi-
cant benefits in terms of performance and power when using ACUJ.
A. Electrical simulations
The experiments were performed with a 65nm low-Vt commercial
library and an AES module [16] synthesized with Synopsys’ Design
Compilerr. A programmable ring oscillator was also synthesized
using gates from the same cell library. Synopsys PrimeTimer was
used to generate a SPICE netlist including the top 5 critical paths
of the circuit and the ring oscillator. The critical paths were totally
disjoint and obtained from different corners. This is a good trade-off
between selecting representative timing paths and making the SPICE
simulations affordable. The simulations were customized to toggle
the inputs of the launching flip-flops at every cycle.
Global voltage variations were modeled by applying sinusoidal
fluctuations with different amplitudes at frequencies fully misaligned
with the clock frequency, as shown in Fig. 1. No local variability was
assumed in this simulation framework.
Table III reports the maximum average frequency achieved without
timing violations at each PVT corner. To evaluate the benefits of
ACUJ with regard to PLL, different scenarios must be considered.
We assume that the chip would work at a nominal voltage with ±10%
fluctuations and at an average temperature of 75oC.
• Worst-case sign-off. In this case, the frequency of the
PLL (0.89GHz) would be determined by the worst corner
(SS, 1.08V, 125oC). The average frequency of ACUJ would
depend on the process parameters of the die and the average
operating conditions. For a typical die (TT) the frequency would
TABLE III
FAVG (GHZ) FOR DIFFERENT PVT PARAMETERS USING PLL AND ACUJ.
Process variability → Typical (TT) Worst (SS)
Voltage Temp. PLL ACUJ PLL ACUJ
25oC 1.59 1.56 1.22 1.21
1.2V 75oC 1.46 1.45 1.13 1.13
125oC 1.39 1.38 1.07 1.07
25oC 1.35 1.55 0.98 1.20
1.2V ± 10% 75oC 1.27 1.45 0.94 1.13
125oC 1.19 1.37 0.89 1.07
25oC 0.81 1.49 0.55 1.16
1.2V ± 30% 75oC 0.79 1.40 0.53 1.10
125oC 0.78 1.34 0.52 1.05
be 1.45GHz. Even in the case of a slow die (SS), the frequency
would be 1.13GHz.
• Speed binning. The margins for process variations would be
mostly reduced for the PLL. Still, the margins for dynamic
variability should be kept. For a typical die, the PLL could
run at 1.19GHz (−10%, 125oC) whereas ACUJ would run at
1.45GHz.
Therefore, speed-ups ranging from 1.27x to 1.63x are obtained with
regard to worst-case sign-off depending on the process parameters
(delays between TT and SS). When comparing with speed binning,
the speed-ups range from 1.22x to 1.27x.
An important observation is the high robustness to voltage noise,
even when the supply changes by ±30%. While the PLL has to
drastically reduce frequency to tolerate voltage droops (e.g., from
1.39 down to 0.78 for a typical die), ACUJ only needs a very small
reduction (from 1.38 down to 1.34). Thus, ACUJ is a resilient solution
for systems living in hostile environments with unreliable power
supplies, like low-cost regulators or energy scavenging scenarios.
When considering local variability, some derating factors would
be applied to both cases, PLL and ACUJ, but the benefits and
conclusions would be similar.
B. FPGA prototype
The tolerance to global variability was evaluated using an FPGA
(Xilinx Spartan 3E) implementing the same AES module [16] and
connected to an oscillating power supply. The clock tree was con-
nected to a multiplexer capable of selecting between a conventional
PLL and a ring oscillator constructed by a chain of CLBs (similar to
the scheme shown in Fig. 4).
The experiments measured the maximum error-free frequency
achievable by the FPGA under different voltages and fluctuations.
Given that the experiments were performed on the same die, no
process variations were measured. For this reason, these experiments
estimated the benefits of ACUJ when compared to a perfect speed
binning. The impact of temperature was not considered.
Fig. 9 plots the maximum frequency that was achieved under dif-
ferent amplitudes of voltage noise. The power supply was generated
as a low-frequency sinusoidal signal, simulating the effect of an
unregulated power supply (higher frequency variations would be cut
by the on-chip decoupling capacitors). The noise amplitude ranged
from 0% to ±30%, i.e., [0.84V . . . 1.56V ].
As expected, the PLL frequency had to be reduced to keep the
circuit operating correctly. However, ACUJ could sustain an almost
constant average frequency across a large range of voltage noise. The
speed-up of ACUJ with regard to a PLL was 1.19x, 1.39x and 2.3x
for ±10%, ±20% and ±30% voltage noise, respectively.
Fig. 10 reports the power benefits for different voltage levels and
±10% voltage noise. The vertical arrows (↓) indicate the power
Fig. 9. FPGA: Voltage Noise vs. Frequency for 1.2V nominal voltage.
Fig. 10. FPGA: Frequency-Power plot for ±10% voltage noise.
savings obtained by voltage scaling at a given average frequency
(−23% at 120 MHz and −25% at 100 MHz). The diagonal arrows
(↗) connect iso-voltage points and represent the speed-up obtained
by simply using ACUJ instead of a PLL without changing voltage.
The results are consistent with the savings estimated by SPICE
simulations under the assumption that dies are perfectly binned.
VII. CONCLUSIONS
After the happy-scaling days, it is time to find mechanisms that can
maximally exploit the capabilities of technology nodes at nanometric
scale. Adaptive Clock with Useful Jitter emerges as an innovative
paradigm to handle variability and an alternative to paying the
exorbitant costs of guardband margins.
REFERENCES
[1] H. Jones, “Why migration to 20nm bulk CMOS and 16/14nm FinFETs is
not best approach for the semiconductor industry,” International Business
Strategies, Los Gatos, CA, Tech. Rep., Jan. 2014.
[2] A. B. Kahng, “Scaling: More than Moore’s law,” IEEE Design & Test,
vol. 27, no. 3, May/June 2010.
[3] ——, “Lithography-induced limits to scaling of design quality,” in Proc.
SPIE, vol. 9053, 2014, pp. 905 302–1–905 302–14.
[4] T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, “A
dynamic voltage scaled microprocessor system,” IEEE Journal of Solid-
State Circuits, vol. 35, no. 11, pp. 1571–1580, 2000.
[5] J. Perez-Puigdemont, A. Calomarde, and F. Moll, “Variation tolerant
self-adaptive clock generation architecture based on a ring oscillator,” in
IEEE InternationalSOC Conference (SOCC), 2012, pp. 387–392.
[6] M. Garg, C. Chai, and J. Bridges, “Adaptive clock generators systems
and methods,” Patent application US 2011/0 140 752 A1, Jun. 16, 2011.
[Online]. Available: http://www.google.com/patents/WO2011081951A1
[7] S. Pant, E. Chiprout, and D. Blaauw, “Power Grid Physics and Implica-
tions for CAD,” IEEE Design & Test of Computers, vol. 24, no. 3, pp.
246–254, 2007.
[8] B. Cory, R. Kapur, and B. Underwood, “Speed binning with path delay
test in 150-nm technology,” IEEE Design & Test of Computers, vol. 20,
no. 5, pp. 41–45, Sep. 2003.
[9] D. Ernst, S. N. S. Kim, Das, S. Pant, R. Rao, T. Pham, C. Zieslera,
D. Blaauw, T. Austin, K. Flautner, and T. Mudge, “Razor: a low-power
pipeline based on circuit-level timing speculation,” in IEEE Micro, 2003,
pp. 7–18.
[10] K. Bowman, J. Tschanz, N. Kim, J. Lee, C. Wilkerson, S. Lu, T. Karnik,
and V. De, “Energy-efficient and metastability-immune resilient circuits
for dynamic variation tolerance,” IEEE Journal of Solid-State Circuits,
vol. 44, no. 1, pp. 49–63, Jan. 2009.
[11] M. S. Gupta, J. A. Rivers, P. Bose, G.-Y. Wei, and D. Brooks,
“Tribeca: design for PVT variations with local recovery and fine-
grained adaptation,” in Proc. IEEE/ACM International Symposium on
Microarchitecture, 2009, pp. 435–446.
[12] K. Chae and S. Mukhopadhyay, “All-digital adaptive clocking to tolerate
transient supply noise in a low-voltage operation,” IEEE Transactions
on Circuits and Systems II: Express Briefs, vol. 59, no. 12, pp. 893–897,
Dec. 2012.
[13] K. Bowman, C. Tokunaga, T. Karnik, V. De, and J. Tschanz, “A 22
nm all-digital dynamically adaptive clock distribution for supply voltage
droop tolerance,” IEEE Journal of Solid-State Circuits, vol. 48, no. 4,
pp. 907–916, Apr. 2013.
[14] C. Lefurgy, A. Drake, M. Floyd, M. Allen-Ware, B. Brock, J. Tierno,
J. Carter, and R. Berry, “Active guardband management in Power7+ to
save energy and maintain reliability,” IEEE Micro, vol. 33, no. 4, pp.
35–45, Jul. 2013.
[15] K. Wilcox, R. Cole, H. R. Fair III, K. Gillespie, A. Grenat, C. Henrion,
R. Jotwani, S. Kosonocky, B. Munger, S. Naffziger, R. Orefice, S. Pant,
D. Priore, R. Rachala, and J. White, “Steamroller module and adaptive
clocking system in 28 nm CMOS,” IEEE Journal of Solid-State Circuits,
vol. 50, no. 1, pp. 24–34, 2015.
[16] M. Litochevski and L. Dongjun, “High throughput and low area
AES,” 2012. [Online]. Available: http://opencores.org/project,aes
highthroughput lowarea
