Low-cost on-chip clock jitter measurement scheme by Omana, Martin et al.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1
Low-Cost On-Chip Clock Jitter
Measurement Scheme
Martin Omaña, Daniele Rossi, Daniele Giaffreda, Cecilia Metra, Fellow, IEEE, T. M. Mak,
Asifur Rahman, and Simon Tam, Senior Member, IEEE
Abstract— In this paper, we present a low-cost, on-chip clock
jitter digital measurement scheme for high performance micro-
processors. It enables in situ jitter measurement during the test or
debug phase. It provides very high measurement resolution and
accuracy, despite the possible presence of power supply noise
(representing a major source of clock jitter), at low area and
power costs. The achieved resolution is scalable with technology
node and can in principle be increased as much as desired,
at low additional costs in terms of area overhead and power
consumption. We show that, for the case of high performance
microprocessors employing ring oscillators (ROs) to measure
process parameter variations (PPVs), our jitter measurement
scheme can be implemented by reusing part of such ROs, thus
allowing to measure clock jitter with a very limited cost increase
compared with PPV measurement only, and with no impact on
parameter variation measurement resolution.
Index Terms— Clock jitter, high performance microprocessor,
jitter measurement.
I. INTRODUCTION
CLOCK is one of the most critical signal in any syn-chronous system, which has to be distributed throughout
the chip using a complex network [1]. With the scaling of
technology and increase in clock frequency, it is becoming
increasingly difficult to guarantee the correctness of clock
signals, due to the increasing likelihood of manufacturing
defects, clock jitter, duty-cycle distortion, process parameter
variations (PPVs) and power supply noise (PSN) [2]–[4].
Jitter affecting clock signal produces uncertainties in its
period and rising/falling edges, thus forcing designers to either
increase the time margins, or face the possibility of oper-
ating malfunctions. For high performance microprocessors,
the adoption of minimum time margin is desirable, so that
on-chip jitter measurement should be performed during the
Manuscript received March 7, 2013; revised September 21, 2013 and
January 31, 2014; accepted February 23, 2014.
M. Omaña, D. Rossi, D. Giaffreda, and C. Metra are with the Uni-
versity of Bologna, Bologna 40136, Italy (e-mail: martin.omana@unibo.it;
d.rossi@unibo.it; daniele.giaffreda2@unibo.it; cecilia.metra@unibo.it).
T. M. Mak was with Intel Corporation, Santa Clara, CA 95054 USA.
He is now with GlobalFoundries, Sunnyvale, CA 94085 USA (e-mail:
tm.mak@globalfoundries.com).
A. Rahman is with Intel Corporation, Portland, OR 97007 USA (e-mail:
asifur.rahman@intel.com).
S. Tam, retired, was with Intel Corporation, Santa Clara, CA 95054 USA
(e-mail: simon.tam@ieee.org).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVLSI.2014.2312431
test or debug phase to validate the design and manufacturing
assumptions for the clock. PSN modulating the delay of
the clock signal is currently recognized as one of the
main causes of clock jitter [4]. It is expected to increase
with technology scaling, due to the increasing complexity
and integration density, resulting in high switching activities
[4]–[6].
Together with clock jitter, also PPV occurring during fabri-
cation are increasingly likely and significant with technology
scaling. They may induce either performance degradation, or
operating malfunctions [7]. Therefore, also PPV mandate on-
die measurement during the test and debug phase to validate
design and process, possibly drive speed-binning, and eventu-
ally dictate design process improvements.
Moreover, if PPV affect the buffers of the clock distribution
network, they can cause clock skew [8]–[10]. Deskew buffers
can be employed to compensate for PPV produced effect [8],
but their application is typically still limited to some portions
of the whole clock distribution network only (e.g., global
distribution), due to cost limitations.
Several measurement schemes have been proposed for clock
jitter [4]–[7], [11]–[15] and PPV [7], [16], [17]. The use of
ring oscillators (ROs) for PPV measurement is widely assessed
and adopted. Instead, schemes for clock jitter measurement are
not as well established yet, mainly because of limits in their
measurement resolution and accuracy [15].
In [14] and [15], jitter measurement schemes based on
Vernier delay lines (VDL) have been proposed. They employ
an additional delay-locked loop to calibrate the delay of
the elements within the VDL against process, temperature,
and voltage variations. Although these techniques provide a
high measurement resolution, they imply a considerable area
overhead.
In [13], a circuit based on a NOT chain delay line has been
proposed. It features resolution equal to a NOT delay, and
requires a considerable area overhead.
Finally, a circuit consisting of latches and NOT chains has
been presented in [4]. It features a measurement resolution
equal to an inverter delay, which can be calibrated to compen-
sate the effects of PSN and PPV on the provided measurement.
However, the required area overhead and power consumption
are not negligible.
Based on the limitations of the approaches proposed so far
to achieve high jitter measurement resolution and accuracy
at limited costs, in this paper, we present a new on-chip
1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
digital measurement scheme, whose basic structure has been
introduced in [18]. It allows to measure clock jitter with high
and scalable resolution at limited costs, and with high accuracy
despite the presence of PSN.
The proposed approach is based on a scheme similar to [4],
with the following main differences: 1) the implementation
of the sampling elements [transfer gates (TGs) rather than
latches]; 2) the usage of multiple out-of-phase delay lines
in our scheme to increase resolution; and 3) the proposal
of a sampling strategy to avoid the impact of PSN on jitter
measurement.
Compared with [4], our scheme allows a 40% reduction in
both area overhead and power consumption. Instead, compared
with the approaches in [14] and [15], our scheme requires a
considerably lower area overhead, while featuring the same
measurement resolution.
Then, as introduced in [19], we show that, for high perfor-
mance microprocessors, the area required by our measurement
scheme can be further reduced by reusing and properly mod-
ifying part of the ROs often employed for PPV measurement.
Our scheme can be set in either the PPV measurement
mode, or the clock jitter measurement mode, by acting on
an external control signal. The effectiveness of our approach
has been verified using electrical level simulations, performed
considering a PSN up to 50% of the nominal power supply
voltage.
The rest of this paper is organized as follows. In Section II,
we present some basics on clock jitter. In Section III, we
introduce our proposed jitter measurement scheme, while in
Section IV, we report some of the results of the electrical
level simulations that we have performed to verify its correct
behavior. We show two possible implementations of our jitter
measurement scheme, one of which reuses the ROs that
are usually adopted in high performance microprocessors for
PPV measurement. In Section V, we evaluate the costs of
our scheme and we compare them with those of alternative
approaches recently proposed. Finally, we give some conclu-
sive remarks in Section VI.
II. JITTER AFFECTING CLOCK SIGNALS
Jitter is the deviation of a signal timing event from its
ideal position [20], causing displacements of clock transition
times. These displacements are categorized as either deter-
ministic, random, or both. We refer to the following jitter
definitions [21]: 1) timing jitter, which is the time difference
between the actual and ideal signal transition; 2) period jitter,
which is the time variation of the signal period from its average
value; and 3) cycle-to-cycle jitter, which is the variation in the
period of a signal within two following periods. It has been
shown that these jitter definitions are mathematically related
to each other [21], therefore, in the reminder of this paper, we
will consider the period jitter only.
Let us first consider the jitter-free clock signal (denoted
by CK) with 50% duty-cycle. It can be described by
CK(t) =
{
1, 0 ≤ t < TCK2
0, TCK2 ≤ t < TCK .
(1)
Fig. 1. (a) Basic block structure of our scheme. (b) Schematic representation
of the propagation of the CK falling (left) and rising (right) edges within the
NOT chain.
In the jitter-free case, the CK high and low phase durations
[DCK−H and DCK−L in Fig. 1(b)] are equal to TCK/2. The
presence of jitter deviating the CK edge by a time ±J changes
the CK high phase duration to: DCK−H = TCK/2 ± J .
III. PROPOSED JITTER MEASUREMENT SCHEME
We measure the duration of clock high and/or low phase(s)
over time, and compare the obtained results with those
expected for the case of jitter-free clock. For the sake of
brevity, we here present the scheme for the clock high phase
measurement only, which can be easily extended to measure
both the clock phases.
A. Scheme With Resolution Equal to a NOT Delay
The basic block structure of our proposed scheme is shown
in Fig. 1(a). The NOT chain implements a delay line delaying
the input CK, whose jitter has to be measured, by a given
amount of time. The outputs of the NOT gates are sampled by
the measurement sample (MS) block, when the control block
(CB) gives valid measure (VM) = 1. The output stage (OS)
produces the measurement encoded by a thermometer code.
By making RS = 1, CB resets the measurement after a time
long enough to allow the system to read it.
Denoting the delay of each NOT in the chain by τ , the
total chain delay is Nτ . The integer N is such that the total
delay covers the whole period TCK of the CK under jitter-free
conditions. Therefore, it is: N = TCK/τ. Considering the
clock signal CK(t) in 1, and denoting its complemented signal
by CK′(t), the signals pi (i = 1 . . . N) can be represented as
pi (t) =
{
CK(t−(i +1)τ ), i odd
CK′(t−(i +1)τ ), i even.
(2)
The logic values simultaneously present at the outputs of
each NOT of the chain after a CK falling and rising edge
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OMAÑA et al.: LOW-COST ON-CHIP CLOCK JITTER MEASUREMENT SCHEME 3
are shown in Fig. 1(b). Each row represents the snap-shot at
one specific time instant. The CK switches at time t1 and its
falling (rising) edge propagates through the chain. The position
within the chain of the CK falling (rising) edge is identified by
two successive zeroes, or two successive ones, whose location
moves progressively to the right. The duration of the CK high
phase is given by the number of NOTs within the chain that
the CK rising edge has to pass through, before the CK falling
edge arrives to the chain input.
To account for the effects of PSN, we have considered the
realistic model in [22] and [23]. It includes also the presence
of coupling capacitors, usually employed within the power
distribution network (PDN) to reduce the current return paths,
thus reducing PSN.
It is worth noticing that, the PDN characteristic impedance
seen from different locations inside the PSN topology may
exhibit significant differences [24]. Depending also on the
operating frequency, the package inductance or the decou-
pling capacitance might prevail. Particularly, when decoupling
capacitors are employed (either at on-chip level, or both at
on-chip and on-board level), the supply voltage waveform
presents a first triangular peak that is considerably higher than
the secondary peaks [25]. Therefore, for evaluation purpose,
we have realistically modeled the PSN as a train of narrow
triangular pulses [22], whose width depends on the number of
switching gates at a single clock period. Moreover, since the
pulsewidth is always very small, the PSN can be modeled as
an impulse train with a uniformly distributed random shift in
[0, tr ], where tr is the rise-time of the clock signal [22]. In our
scheme, to reduce the effects of PSN on jitter measurement,
MS samples the values present on signals pi (i = 1 . . . N)
when the CK falling edge arrives at the input of the second
NOT of the chain (ps), rather than at the input of the first NOT.
This allows the PSN to vanish before sampling. The sampling
instant is identified by the condition pS = p1 = 1.
PSN may also influence the delay of some NOT gates
while the clock edge travels through the delay line. Using
electrical level simulations, we have verified that only the NOT
propagating the CK edge when the PSN occurs is impacted,
while the sampling circuitry is not. The variation in the delay
of the NOT propagating the CK edge determines the impact of
PSN on measurement accuracy. We have determined that such
a not delay variation impacts the jitter measurement accuracy
of our scheme by only 6.1%. Therefore, we have considered
a constant delay τ of the NOTs in the mathematical model of
our scheme behavior.
The useful bits representing the jitter measurement start
from the output of the second NOT of the chain, denoted by p1.
The output pS , together with its associated signal outS is used
by the control block CB to determine the sampling instant.
Our scheme samples the outputs of the NOT chain at a time
instant denoted by tS , after the CK rising edge. It is
ts = DCK−H + τ = TCK/2 ± J + τ. (3)
At time tS , CB asserts VM, and the values on signals pS and
pi (i = 1 . . . N) are sampled by MS and provided as outputs
on outs and outi (i = 1 . . . N), respectively. We determine the
logic values sampled on each outi by making t = tS in (2).
Fig. 2. Word produced by our scheme at the outputs of OS.
We obtain
outi = pi (tS ) =
⎧⎪⎪⎨
⎪⎪⎩
CK
( TCK
2 ± J − iτ
)
, i odd
(i = 1 . . . N)
CK′
( TCK
2 ± J − iτ
)
, i even.
(4)
After sampling, the OS block gives at its outputs
oRi = CK′
(
TCK
2
± J − iτ
)
, (i = 1 . . . N). (5)
The word on oRi (i = 1 . . . N) is encoded by the thermome-
ter code, as shown in Fig. 2. It consists of a number of 0 s
equal to i0, followed by (N − i0) 1 s, so that
oRi =
{
0, if 1 ≤ i ≤ i0
1, if i0 ≤ i ≤ N. (6)
According to (5), oRi = 0 (∀i ≤ i0) if the argument of CK′
is greater than or equal to 0. Thus, we can simply obtain the
value of i0 by equating the argument of CK′ in (5) to 0, that
is: TCK/2 ± J − iτ = 0, for i = i0. Therefore, it is
i0 = 1
τ
(
TCK
2
± J
)
. (7)
The resolution (Res) of our scheme is given by the minimum
variation in the CK high phase duration resulting in one
more 0 (1) at the outputs oRi . The Res value can be determined
as the difference between the arguments of oRi and oR(i+1),
when it is oRi = 0 and oR(i+1) = 1. Therefore
Res =
(
TCK
2
± J − iτ
)
−
(
TCK
2
± J − (i + 1)τ
)
= τ. (8)
The thermometer encoding produced by our scheme allows
to easily derive the clock jitter measurement. The encoded
word oRi(i = 1 . . . N) can be compared with that expected
in the case of jitter-free CK through N parallel XORs. The
comparison results in an N-bit vector with a number of 1 s
equal to the difference between the number of 0 s in the
produced encoded word and in the expected one. The number
of 1 s can be counted, and jitter measurement can be obtained
by multiplying it by the scheme resolution. After a time long
enough to allow the system to read the performed measure, the
scheme can be reset by asserting RS , thus making it ready for
a following measurement. We assume signal Rs is activated
every other CK cycle. We use a periodic signal generated reset
(GR) with half the CK frequency to generate the RS pulse
upon its rising edge. The timing of these signals are shown in
Fig. 3.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Fig. 3. Schematic representation of the timing of signals VM and RS .
Fig. 4. (a) Block structure of our proposed scheme providing higher
measurement resolution than that in Fig. 3. (b) Representation of the signals
produced at the outputs of its NOT chains.
B. Scheme With Resolution of Half NOT Delay
To obtain a resolution higher than a NOT delay (τ ), we
replace the basic block scheme in Fig. 1(a) with that in
Fig. 4(a), employing two NOT chains, rather than one. NOT
chain 1 consists of NOTs each with a delay equal to τ . Instead,
in NOT chain 2, the first NOT has a delay equal to (1 + ½)τ ,
while all other NOTs have a delay equal to τ . This way, the
outputs of the corresponding NOTs of the two chains have a
τ /2 phase difference. The timings of the signals VM and RS
of our scheme in Fig. 4(a) are the same as shown in Fig. 3.
Considering the signal CK(t) in (1), we represent signals
p j i (i = 1 . . . N; j = 1, 2) as a function of time as follows:
p1i(t) =
⎧⎪⎨
⎪⎩
CK(t − (i + 1)τ ), i odd
(i = 1 . . . N)
CK′(t − (i + 1)τ ), i even
(9)
p2i (t) =
⎧⎪⎨
⎪⎩
CK′
(
t − (i + 12 )τ ), i odd
(i = 1 . . . N)
CK
(
t − (i + 12)τ ), i even.
As described before, to achieve low sensitivity to PSN, the
values at the outputs of the NOT chains are sampled by MS
at time tS = TCK/2 ± J + τ . This occurs when the CK falling
edge arrives at the input pS of the second NOT of chain 1
(i.e., when pS = p12 = 1). MS receives the signal pS together
with the 2N signals p j i (i = 1 . . . N; j = 1, 2) from the
NOT chains, and it outputs the signal outS (the sampled value
of pS), together with the signals outm (m = 1 . . .2N). These
latter signals are the sampled value of p21, p11, p22, p12, p23,
and so on, which represent the jitter measure. At the sampling
instant tS , it is
outm =
⎧⎪⎨
⎪⎩
p2m/2(tS ), m odd
(m = 1 . . . 2N)
p1(m/2)(tS ), m even.
(10)
Such signals feed the block OS, which performs the same
function as in (5). This way, the jitter measurement on oRm
(m = 1 . . .2N) is encoded by a thermometer code. It is
oRm = CK′
(
TCK
2
± J − m
2
τ
)
, oRm =
{
0, 1 ≤ m ≤ m0
1, m0 ≤ m ≤ 2N
(11)
where m0 is the order of the last oRm = 0. According to
(11) and Fig. 1(b), it is oRm = 0, if the argument of CK′ is
greater than or equal to 0. The value of m0 can be obtained
by equating the argument of CK′ in (11) to 0. It is
m0 = 2
τ
(
TCK
2
± J
)
. (12)
The Res of our scheme in Fig. 4(a) can be expressed as the
difference between the arguments of oRm and oR(m+1), when
it is oRm = 0 and oR(m+1) = 1. From (11), it derives that
Res=
(
TCK
2
± J − m
2
τ
)
−
(
TCK
2
± J − (m+1)
2
τ
)
= τ
2
. (13)
Therefore, the resolution of our measurement scheme can
be scaled by properly adding a NOT chain to the scheme in
Fig. 2(a), and by properly sizing their NOTs.
C. Scheme With Resolution Higher Than Half NOT Delay
Let us consider the general case of n chains. Chain 1 still
consists of NOTs each with an delay equal to τ . As for the
remaining n−1 NOT chains, the first NOT of the j th NOT chain
( j = 2 . . . N) has a delay d j1 = (1+( j−1)/n)τ ( j = 2 . . . N),
while all other NOTs have a delay equal to τ .
By considering as output the alternated succession of
the n NOT chain outputs (i.e., p21, p31, . . . , pn1, p12, p22,
p32, . . . , pn2, etc.), any two following outputs will have a
phase difference equal to τ/n. The expressions of signals p j i
(i = 1 . . . N; j = 1 . . . N), as a function of time, are
p1i (t) =
⎧⎪⎨
⎪⎩
CK(t − (i + 1)τ ), i odd
(i = 1 . . . N)
CK′(t − (i + 1)τ ), i even
p ji (t) =
⎧⎪⎨
⎪⎩
CK′
(
t − (i + j−1
n
)
τ
)
, i odd
(i = 1 . . . N; j = 2 . . . N)
CK
(
t − (i + j−1
n
)
τ
)
, i even.
(14)
Extending the function of the OS block (11) to the case of
n NOT chains, we obtain:
m0 = n
τ
(
TCK
2
± J
)
. (15)
The resolution of our scheme with n NOT chains is given by
the difference between the arguments of oRm and oR(m+1),
when oRm = 0 and oR(m+1) = 1. From (15), it is
Res =
(
TCK
2
± J − m
n
τ
)
−
(
TCK
2
± J − (m+1)
n
τ
)
= τ
n
. (16)
The achievement of an increasingly higher resolution by
augmenting the number of NOT chains is limited by the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OMAÑA et al.: LOW-COST ON-CHIP CLOCK JITTER MEASUREMENT SCHEME 5
Fig. 5. Possible implementation. (a) NOT chains. (b) MS and OS blocks.
Fig. 6. (a) Implementation of the considered NOTs with programmable delay.
(b) Propagation delay variation in case of PPVs as a function of the program
signals (A, B, and C).
difficulty in controlling the NOT delays, due to PPV. To solve
this issue, the NOT chains can be implemented using balanced
delay lines [26], or inverters whose delay can be calibrated
after fabrication [27].
IV. IMPLEMENTATION AND VERIFICATION
Two possible implementations of our scheme to measure
the CK high phase are here described. We consider a standard
65-nm CMOS technology [28], VDD = 1.1 V, 3-GHz clock
frequency and two NOT chains [Fig. 4(a)]. Particularly, first,
we introduce a possible implementation, in which we have
designed all the blocks required by our scheme (and introduced
in Section III). Then, we present a possible implementation
reusing ROs usually present in high performance microproces-
sors for PPV measurement. Finally, we show some results
of the HSPICE electrical level simulations that we have
performed to verify the behavior of our scheme.
A. Implementation Non Reusing ROs
Let us consider the two NOT chains [Fig. 5(a)]. In chain 1,
all NOTs exhibit a delay τ ∼= 12 ps; in chain 2, the first NOT
has a delay equal to (1 + ½)τ ∼= 18 ps, while all other NOTs
have a delay τ . Since each chain needs to cover TCK(∼=333 ps),
the number of NOTs within each chain is N = TCK/τ = 28.
As for MS and OS, their possible implementation is shown
in Fig. 5(b). The inputs of MS (pS and p j i ) are connected to
its outputs (outS and outm , respectively) through TGs driven
by VM and VM’. This way, when VM = 0, all TGs conduct
and connect the outputs of the NOT chains to signals outS and
outm . Instead, when VM flips to 1, all TGs are turned off, so
that the outputs of the NOT chains are sampled. Signals outS
and outm remain in a high impedance state, keeping latched
the logic values till reset.
Fig. 7. Possible implementation of circuits generating (a) VM and VM’ and
(b) RS ’ and RS .
As for OS, it buffers the outm signals and encodes them by
a thermometer code on signals oRm . The sampled data must
be maintained for one clock cycle only. Therefore, dynamic
latches have been considered, rather than more costly static
latches, to reduce implementation costs. When Rs is asserted,
VM flips to 0, making all TGs conductive again. Thus, all
signals oRm become equal to 1, thus removing the previous
measurement results.
The outputs of the NOTs at the same level i (i = 1 . . . N)
within the j th chain ( j = 1, 2) present a phase difference of
τ/2 ∼= 6 ps. According to (13), this is also the resolution pro-
vided by our scheme. To compensate possible PPV occurring
during manufacturing, the inverter chains have been imple-
mented by NOTs with a programmable delay [27], as shown in
Fig. 6(a). Using Monte Carlo simulations, we have evaluated
the variations in the NOT delay due to PPV. The achieved
results are shown in Fig. 6(b). We can observe that the vari-
ations of the NOT delay are within a range of approximately
the ±20% of its nominal value of 12 ps. Fig. 6(b) also
shows that, by properly setting to 1 the program signals
(i.e., A, B, and C) of the considered NOT, the delay of the
NOT can be adjusted to compensate the variations due to PPV.
PPV may also imbalance the low-to-high and high-to-low
transitions of the NOTs. However, we have verified this neg-
ligibly impacts the duty-cycle of the clock (by less than 1%),
whose jitter is being measured. The delay of the NOTs is
also sensitive to voltage and temperature variations. Such a
sensitivity may be reduced by employing one of the techniques
that have been proposed in the literature for the on-chip
compensation of voltage and temperature variations (e.g., that
in [29]).
As for CB, Fig. 7(a) and (b) shows an implementation of the
circuits generating VM, RS and their complemented signals.
Signals VM and VM’ should be fine-tuned to avoid any
systematic error. Their delays can be equalized by considering
the scheme in Fig. 7(a). Signal VM (VM’) should flip to 1 (0)
when pS = p12 = 1, and it should be kept at this value till
reset. This can be obtained by exploiting signals outS and out4
generated at the output of MS, that remain latched at the high
logic value till reset. The reset signal RS (and R′S ) is activated
every other CK cycle, and may be implemented by the circuit
in Fig. 8(b). The signal GR can be obtained by a standard
divide-by-2 circuit [30].
B. Implementation Reusing ROs
We refer to the PPV measurement strategy in [16]: it
consists of many functional unit blocks (FUBs), each com-
posed by q ROs with K (usually equal to 99) NOTs.
The FUB internal structure is shown in Fig. 8 [12]. The NOTs
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Fig. 8. Internal structure of the FUBs in [12].
Fig. 9. Implementation of our jitter measurement scheme with two reused
ROs of the scheme, for the case of Res = τ/2.
within each RO are equal to each other. Instead, the ROs within
the same FUB are generally different to allow a more accurate
measurement of PPV [16]. Some ROs consist of minimum
sized NOTs, others are 2X, 3X, and hence forth.
As an example, let us consider the reuse of two ROs,
allowing us to achieve a measurement resolution equal to τ/2,
as shown in Fig. 9. The delay of each NOT chain should cover
the whole TCK , thus N = 28 NOTs out of the 99 available
are reused. We modified the FUB in Fig. 8 by connecting
multiplexers M1 and M2 to the input of the NAND gates
N1 and N2, respectively. This way, by externally acting on
the control signal JT, our scheme can be easily set in either
the PPV measurement mode (JT = 1), or the clock jitter
measurement mode (JT = 0). Blocks MS, OS, and CB in
Fig. 9 are the same as in our implementation non reusing ROs.
CK (CKd) must propagate through M1 (M2) and N1 (N2)
before entering the NOT chain. As discussed in Section III,
the jitter measure is sampled when pS = p11 = 1 and the CK
falling edge arrives to the input of the first NOT of the upper
RO, thus making it immune to PSN. As for the NOTs of the
chains, we cannot implement them featuring a programmable
delay to compensate PPV. However, by initially configuring
the FUBs in the PPV measurement mode, we can determine
the variation in the delay of the NOTs of the ROs over the
nominal value. This makes it possible to correct possible clock
jitter measurement errors induced by the presence of PPV.
C. Verification
We show some of the results of the HSPICE simulations
that we have performed to verify the behavior of our jitter
Fig. 10. Simulation results for nominal values of electrical parameters,
considering a PSN of 50% of VDD .
measurement scheme, considering both the implementations
in Sections IV-A and IV-B. The PSN has been modeled as
described in Section III.A, with a peak value of 50% of VDD.
We also account for the setup and hold times of the sampling
circuits, whose values are: tsetup ∼= 2.2 ps, thold ∼= 10 fs.
Our scheme non reusing ROs produces an output oRm as
in (11). Thus, when no jitter affects CK (i.e., J = 0), outputs
oRm are encoded by a thermometer code with a number of
zeros equal to m0 = TCK/τ = 168/6 ps = 28. Fig. 10
shows the simulation results considering the case with no jitter
affecting the first measured CK high phase (CK HP 1), and a
jitter of 7 ps widening the second measured CK high phase
(CK HP 2). As expected, when no jitter occurs (CK HP 1),
while VM = 1 (Valid meas 1) our scheme outputs a word
encoded by the thermometer code with 28 zeros (i.e., oRm = 0
for 1 ≤ m ≤ m0 = 28, oRm = 1 for 29 ≤ m ≤ 58). Since we
measure jitter as the difference in the number of zeros between
the produced output and the one expected for the jitter-free
case (equal to 28) multiplied by the resolution of our scheme
(equal to 6 ps), we correctly obtain a measurement of jitter
equal to 0 ps.
Instead, when for instance a jitter of J = 7 ps affects CK
(CK HP 2), while VM = 1 (Valid meas 2) our scheme outputs
a word encoded by the thermometer code with 29 zeros (i.e.,
oRm = 0 for m = 1 . . .29, oRm = 1 for i = 30 . . . 58), thus
resulting in a jitter measure equal to J = 6 ps, with 1-ps
measurement error. Therefore, our scheme is able to measure
jitter with the expected resolution of 6 ps, even in the presence
of PSN.
As for the implementation of our scheme reusing ROs, we
have verified that:
1) the PPV measurement accuracy of the original FUB in
[16] is not degraded;
2) the clock jitter measurement accuracy is the same as for
the implementation of our scheme non reusing ROs.
As for point 1), we have compared the oscillation period of
the ROs of the original FUB (TRO_orig) with that of the ROs
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OMAÑA et al.: LOW-COST ON-CHIP CLOCK JITTER MEASUREMENT SCHEME 7
Fig. 11. Oscillation periods of the ROs in [12] (TRO_orig), and of the
ROs reused by our jitter measurement scheme (TRO_our), as a function of
(a) threshold voltage Vth and (b) oxide thickness Tox.
Fig. 12. Simulation results for nominal values of electrical parameters,
considering a PSN of 50% of VDD .
reused by our scheme for jitter measurements (TRO_our ), as a
function of parameters Vth and Tox. Fig. 11(a) and (b) shows
TRO_orig and TRO_our , as a function of Vth and Tox variations
(Vth and Tox) up to ±30% of their nominal values.
As we can see, the relative difference between TRO_our and
TRO_orig is always negligible, with 4% maximum increase with
Vth variation, and 3% with Tox variation. Similar results have
been achieved for other PPVs. Therefore, the reuse of the ROs
of the FUB in [16] to allow also clock jitter measurement does
not impact the PPV measurement accuracy.
As for point 2), since all NOTs of the two reused and
modified ROs present a delay τ of 12 ps, our scheme
should provide a clock jitter measurement resolution of
Res = τ/2 ∼= 6 ps. Fig. 12 shows the simulation results
obtained for nominal values of electrical parameters and with
a PSN of 50% of VDD. The two cases of no clock jitter
(CK HP 1), and clock with a jitter of 7 ps widening the
second measured CK high phase (CK HP 2) are depicted.
As expected, with no jitter, while VM = 1 (Valid meas 1) our
scheme provides on oRm the same encoded word with 28 zeros
as for the implementation without reusing ROs. Analogously,
the same results have been obtained considering a jitter of 7 ps
TABLE I
AREA AND POWER COSTS OF THE COMPARED SCHEMES, AND RELATIVE
REDUCTIONS ((%) = 100([4, 14]−OUR)/[4, 14])
affecting the CK high phase (CK HP 2), with a word encoded
by the thermometer code containing 29 zeros. Therefore, our
jitter measurement scheme implemented by reusing the ROs
of the FUB is able to measure clock jitter with the same
resolution as the scheme in Section IV-A.
V. COSTS AND COMPARISON
We have evaluated the costs of our proposed scheme,
implemented with and without reusing ROs, in terms of
additional area and power consumption. We have compared
it with the schemes in [4], [14], and [15]. Since neither
implementation details, nor costs are reported in [13], it has
not been considered for comparison.
As for [4]–[14], they feature the same resolution as our
approach implemented with one NOT chain, which for the
considered 65-nm CMOS technology is equal to 12 ps.
For comparison purposes, the latches and logic gates of the
scheme in [4] have been implemented as the standard latch
in [31], and minimum sized symmetric logic gates. The area
of our scheme and [4] has been roughly estimated as the
gate area of all transistors, while their power consumption
has been evaluated by HSPICE simulations. As for [14], we
considered the costs reported by the authors, which refer to
a true implementation on a test chip with a 65-nm CMOS
technology.
The obtained results are shown in Table I. As can be
observed, when our scheme does not reuse the ROs of the
FUBs, it allows a 40% reduction in both additional area and
power over [4]. Compared with [14], our approach allows
99.7% additional area reduction. Instead, our scheme requires
a power higher than [14] by 11% only, for an operating
frequency 30 times higher (3 GHz versus 100 MHz).
On the other hand, when our approach is implemented
by reusing the ROs of the FUBs, it allows 74% and 30%
reduction in area and power consumption, respectively, over
the approach in [4]. Compared with [14], in this case, our
approach allows a 99.9% reduction in area. Instead, as for
power, it is 28% higher, for an operating frequency 30 times
higher. The area reported for the scheme in [14] refers to a
true implementation on a test chip, while for our scheme is a
rough estimation of the gate area of all transistors.
From Table I, it can be noticed that by reusing the ROs of
the FUBs to implement our jitter measurement scheme, we
obtain a considerable reduction of additional area over our
scheme not reusing the ROs (approximately 55%). However,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
TABLE II
AREA AND POWER COSTS OF THE COMPARED SCHEMES, AND RELATIVE
REDUCTIONS ((%) = 100([15]−OUR)/[15])
the reuse of ROs requires a limited increase in power con-
sumption (approximately 15%), compared with the case with
no ROs reuse. This is because our scheme with not reused ROs
is implemented with a NOT chain composed by only 29 NOTs
(Section IV), while our scheme reusing the ROs employs all
K = 99 NOTs of the reused RO.
As for [15], it features the same resolution as our approach
implemented with four NOT chains equal to 3 ps for the
considered 65-nm CMOS technology. The results are shown
in Table II. We can observe that, if our scheme does not reuse
ROs, it allows a 94% reduction in additional area, while by
reusing four ROs of the FUB, it allows a 97% area reduction.
Similar to [14], the area reported for the scheme in [15] in
Table II refers to a true implementation on a test chip.
VI. CONCLUSION
We have proposed an on-chip clock jitter measurement
scheme for high performance microprocessors. The scheme
enables in situ jitter measurement during the test or debug
phase. It allows to achieve a very high and scalable measure-
ment resolution and accuracy, despite the presence of PSN. We
have shown that, when our scheme is implemented to feature
the same resolution as the previous approach in [4], it allows a
40% reduction in both area and power consumption. Instead,
compared with the approaches in [14] and [15], our scheme
requires a considerably lower area overhead, while featuring
the same measurement resolution.
We have also shown that, for the case of microprocessors
employing ROs to measure PPVs, our jitter measurement
scheme can be implemented by reusing part of the ROs, thus
allowing a 55% reduction of additional area over our scheme
not reusing the ROs.
REFERENCES
[1] S. Tam, S. Rusu, U. Desai, R. Kim, J. Zhang, and I. Young, “Clock
generation and distribution for the first IA-64 microprocessor,” IEEE
J. Solid-State Circuits, vol. 35, no. 11, pp. 1545–1552, Nov. 2000.
[2] C. Metra, D. Rossi, and T. M. Mak, “Won’t on-chip clock calibration
guarantee performance boost and product quality?” IEEE Trans. Com-
put., vol. 56, no. 3, pp. 415–428, Mar. 2007.
[3] J. M. Cazeaux, M. Omaña, and C. Metra, “Novel on-chip circuit for
jitter testing in high-speed PLLs,” IEEE Trans. Instrum. Meas., vol. 54,
no. 5, pp. 1779–1788, Oct. 2005.
[4] R. Franch et al., “On-chip timing uncertainty measurements on IBM
microprocessors,” in Proc. IEEE ITC, Oct. 2008, pp. 1–7.
[5] C. Metra, L. Schiano, and M. Favalli, “Concurrent detection of power
supply noise,” IEEE Trans. Rel., vol. 52, no. 4, pp. 469–475, Dec. 2003.
[6] Finding Sources of Jitter with Real-Time Jitter Analysis, Agilent Tech-
nologies, Santa Clara, CA, USA, 2007.
[7] M. Bhushan, A. Gattiker, M. Ketchen, and K. Das, “Ring oscillators for
CMOS process tuning and variability control,” IEEE Trans. Semicond.
Manuf., vol. 19, no. 1, pp. 10–18, Feb. 2006.
[8] N. A. Kurd, J. S. Barkatullah, R. O. Dizon, T. D. Fletcher, and
P. D. Madland, “A multigigahertz clocking scheme for the pen-
tium 4 microprocessor,” IEEE J. Solid-State Circuits, vol. 36, no. 11,
pp. 1647–1653, Nov. 2001.
[9] M. Omaña, D. Rossi, and C. Metra, “Fast and low-cost clock deskew
buffer,” in Proc. 19th IEEE Int. Symp. Defect Fault Tolerance VLSI Syst.,
Oct. 2004, pp. 202–210.
[10] M. Omaña, D. Rossi, and C. Metra, “Low cost scheme for on-line clock
skew compensation,” in Proc. 23rd IEEE VLSI Test Symp., May 2005,
pp. 90–95.
[11] H. C. Lin et al., “CMOS built-in test architecture for high-speed jitter
measurement,” in Proc. IEEE ITC, Oct. 2003, pp. 646–652.
[12] P. Dudek, S. Szczepanski, and J. V. Hatfield, “A high resolution CMOS
time-to-digital converter utilizing a Vernier delay line,” IEEE J. Solid-
State Circuits, vol. 35, no. 2, pp. 240–247, Feb. 2000.
[13] H. Wang, W. Zhou, Z. Li, S. Qian, W. Jiang, and C. Wang, “A time
and frequency measurement method based on delay-chain technique,”
in Proc. IEEE Int. Freq. Control Symp., May 2008, pp. 484–486.
[14] C.-C. Chung and W.-J. Chu, “An all-digital on-chip jitter measurement
circuit in 65nm CMOS technology,” in Proc. IEEE Int. Symp. VLSI Des.,
Autom. Test, Apr. 2011, pp. 1–4.
[15] K. Niitsu, M. Sakurai, N. Harigai, T. J. Yamaguchi, and H. Kobayashi,
“CMOS circuits to measure timing jitter using a self-referenced clock
and a cascaded time difference amplifier with duty-cycle compensa-
tion,” IEEE J. Solid-State Circuits, vol. 47, no. 11, pp. 2701–2710,
Nov. 2012.
[16] S. B. Samaan, “Parameter variation probing technique,” U.S. Patent
6 535 013, Mar. 18, 2003.
[17] Z. Abuhamdeh, B. Hannagan, A. Crouch, and J. Remmers, “A pro-
duction IR-drop screen on a chip,” in Proc. IEEE Des. Test Comput.,
Jun. 2000, pp. 216–224.
[18] C. Metra, M. Omaña, T. M. Mak, A. Rahman, and S. Tam, “Novel
on-chip clock jitter measurement scheme for high performance micro-
processors,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst.,
Oct. 2008, pp. 465–473.
[19] M. Omaña, D. Giaffreda, C. Metra, T. M. Mak, S. Tam, and A. Rahman,
“On-die ring oscillator based measurement scheme for process parameter
variations and clock jitter,” in Proc. IEEE Int. Symp. Defect Fault
Tolerance VLSI Syst., Oct. 2010, pp. 265–272.
[20] Wavecrest Corporation, Eden Prairie, MN, USA.
(2005, Jul. 6). Jitter Fundamentals [Online]. Available:
http://www.wavecrest.co.jp/jittfun_hires_sngls.pdf
[21] T. J. Yamaguchi, M. Soma, D. Halter, R. Raina, J. Nissen, and M. Ishida,
“A method for measuring the cycle-to-cycle period jitter of high-
frequency clock signals,” in Proc. IEEE VLSI Test Symp., Apr./May
2001, pp. 102–110.
[22] P. Heydari and M. Pedram, “Analysis of jitter due to power-supply noise
in phase-locked loops,” in Proc. IEEE Conf. Custom Integr. Circuits,
May 2000, pp. 443–446.
[23] D. Rossi, A. Muccio, A. K. Nieuwland, A. Katoch, and C. Metra,
“Impact of ECCs on simultaneously switching output noise for on-chip
busses of high reliability systems [error correcting codes],” in Proc. 10th
IEEE On-Line Test. Symp., Jul. 2004, pp. 135–140.
[24] H. Lan, M. Han, and R. Schmitt, “Modeling and measurement of supply
noise induced jitter in a 12.8Gbps single-ended memory interface,” in
Proc. IEEE 21st Conf. EPEPS, Oct. 2012, pp. 43–46.
[25] S. R. Chan, F. N. Tan, and R. Mohd-Mokhtar, “Simultaneous switching
noise impact to signal eye diagram on high-speed I/O,” in Proc. IEEE
4th ASQED, Jul. 2012, pp. 200–205.
[26] R. Datta, G. Carpenter, K. Nowka, and J. A. Abraham, “A scheme for on-
chip timing characterization,” in Proc. IEEE VLSI Test Symp., May 2006,
pp. 24–29.
[27] J. Dunning, G. Garcia, J. Lundberg, and E. Nuckolls, “An all-digital
phase-locked loop with 50-cycle lock time suitable for high-performance
microprocessors,” IEEE J. Solid-State Circuits, vol. 30, no. 4,
pp. 412–422, Apr. 1995.
[28] Predictive Technology Model (PTM), San Jose, CA, USA [Online].
Available: http://ptm.asu.edu/
[29] Y. Tsugita, K. Ueno, T. Asai, Y. Amemiya, and T. Hirose, “On-chip
PVT compensation techniques for low-voltage CMOS digital LSIs,” in
Proc. IEEE ISCAS, May 2009, pp. 1565–1568.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
OMAÑA et al.: LOW-COST ON-CHIP CLOCK JITTER MEASUREMENT SCHEME 9
[30] R. Chen, “High-speed CMOS frequency divider,” Electron. Lett., vol. 33,
no. 22, pp. 1864–1865, Oct. 1997.
[31] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design:
A System Pespective. Reading, MA, USA: Addison-Wesley, 1993, ch. 5,
p. 327.
Martin Omaña received the Degree in electronic
engineering from the University of Buenos Aires,
Buenos Aires, Argentina, and the Ph.D. degree in
electronic engineering and computer science from
the University of Bologna, Bologna, Italy, in 2000
and 2005, respectively.
He was awarded a MADESS grant and joined the
University of Bologna, in 2002, where he is cur-
rently a Post-Doctoral Fellow. His current research
interests include fault modeling, on-line test, robust
design, fault tolerance, and photovoltaic systems.
Daniele Rossi received the Degree in electronic
engineering and the Ph.D. degree in electronic engi-
neering and computer science from the University
of Bologna, Bologna, Italy, in 2001 and 2005,
respectively.
He is currently a Post-Doctoral Fellow with the
University of Bologna. His current research interests
include fault modeling and fault tolerance, coding
techniques for fault tolerance and low-power sig-
nal integrity for communication infrastructures, and
robust design for soft error resiliency.
Daniele Giaffreda received the M.S. degree in elec-
tronic engineering from the University of Bologna,
Bologna, Italy, in 2009, where he is currently pur-
suing the Ph.D. degree with the Advanced Research
Center on Electronic Systems for Information and
Communication Technologies E. De Castro.
His current research interests include faults model-
ing and electrical simulations with particular empha-
sis on the silicon photovoltaic solar cell.
Cecilia Metra (F’14) is a Professor of Electronics
with the University of Bologna, Bologna, Italy. Her
current research interests include fault modeling,
on-line test, robust design, fault tolerance, energy
harvesting, and photovoltaic systems.
Prof. Metra is a Vice-President for Technical and
Conference Activities of the IEEE Computer Society
(CS) for 2014, and a member of the Board of
Governors of the IEEE CS 2013-2015. Since 2013,
she has been Editor-in-Chief of the IEEE CS on-
line publication Computing. She is a Golden Core
Member of the IEEE CS.
T. M. Mak received the M.S. degree from Hong
Kong Polytechnic University, Hong Kong.
He was doing Test Research and Development at
Intel Corporation, Santa Clara, CA, USA, when this
paper was written, and is responsible for 2.5/3-D test
and DFT strategy at GLOBALFOUNDRIES, Santa
Clara. He has more than 30 years of experience
in microprocessor test, product development, design
automation, research mentoring, and DFT.
Asifur Rahman has been serving with Intel Cor-
poration, Santa Clara, CA, USA, as a Platform
Debug Architect for Silicon Debug Technology and
Research for the past six years. Prior to that, he
was actively designing circuits and performing sim-
ulation tasks for the state-of-the-art processors for
12 additional years. Since 1997, he has been actively
involved in the forefront of microprocessor design at
Intel and served as the Lead Designer for Floating
Point Divider. He holds specific expertise for dense
low-leakage circuit design with extreme high-speed
operations. In 2002, he invented the first optical logic recognition technology
using 1064-nm emission light directly from silicon. In 2005, he invented the
industry’s first software integration engine that can connect multiple OS-based
CAD applications and data models with physical probing hardware. He is an
Adjunct Faculty Instructor with Portland State University, Portland, OR, USA,
and actively conducts university level research with interns from various U.S.
and Japan institutions. He has authored eight technical journals and conference
papers, and holds three U.S. patents in the related fields of silicon debug and
software integration and one international patent on solar technology.
Simon Tam (SM’07) received the B.S., M.S., and
Ph.D. degrees in electrical engineering and com-
puter sciences from the University of California at
Berkeley, Berkeley, CA, USA.
He was a Senior Principal Engineer with the
Microprocessor Development Group, Intel Corpo-
ration, Santa Clara, CA, USA, in 2011, engaged
with the design of server microprocessors with spe-
cial emphasis on high-frequency clocking architec-
ture and circuits. Before joining the Microprocessor
Development Group, he was with the Intel Neural
Network Group. He designed electrically programmable neural network chips
using analog VLSI techniques and EEPROM technology. He was also with
the Intel California Technology Development Division engaged with the
development of flash memory and EEPROM. He holds 29 U.S. patents, has
authored and co-authored 47 technical publications, and has authored one
book chapter, all in the areas of microprocessor designs, nonvolatile memory,
and neural network circuit technologies.
Dr. Tam was a member of the Technical Program Committee of the
2007–2010 Symposium on VLSI Circuits and the 2009 Custom Integrated
Circuit Conference.
