A High Resolution Flash Time-to-Digital Converter Taking Into Account Process Variability by Minas N et al.
A High Resolution Flash Time-to-Digital Converter Taking Into Account 
Process Variability 
 
 
Nikolaos Minas, David Kinniment, Keith Heron, Gordon Russell 
Newcastle University, UK 
{Nikolaos.Minas,David.Kinniment,Keith.Heron,G.Russell}@ncl.ac.uk 
 
 
Abstract 
 
Timing issues are a major concern in the design of 
high performance synchronous, asynchronous circuits 
and GALS. Investigations into the causes of many 
timing problems cannot be satisfactorily undertaken 
using external equipment due to its remoteness from 
the source of the potential problem; this necessitates 
the development of on-chip time measurement 
circuitry. Current techniques have the capability of 
resolving timing differences down to 5ps [1], however 
further improvement is impeded by process variations. 
This paper describes a flash Time to Digital Converter 
(TDC) suitable for on-chip implementation. The theory 
to overcome the effects of process variations, 
potentially permitting the time resolution down to one 
picosecond is described. Proof of concept is 
demonstrated by implementing the techniques in an 
FPGA, improving on the current resolution of FPGA 
implementation of a TDC. 
 
1. Introduction 
 
The motivation for the design of a flash TDC with 
resolution of a few picoseconds comes from the area of 
Globally Asynchronous Locally Synchronous (GALS) 
systems on chip that use IP blocks from different 
vendors with unrelated clocks and may use many 
synchronizers to resynchronize the data transmitted 
from one IP block to another.  All reliability 
projections of the synchronizers used are based on an 
extrapolation of a simplified Mean Time Between 
Failure (MTBF) formula, which does not take into 
account important effects such as voltage, process and 
temperature variations. A method for measuring deep 
metastability has been presented by Kinniment et al 
[2]. 
However, this method can be costly since an external 
test equipment to produce the histogram used in the 
measurement process. On the other hand the method 
presented in this paper produces the histogram on-chip. 
It is more accurate since the conversion time of the 
TDC is smaller than the dead time of the oscilloscope 
between successive measurements. Moreover the 
method can be used in FPGAs as well as on-chip 
applications since the implementation is architecture 
independent. Furthermore the technique can be used 
for clock jitter measurement.  
In the past the main method used to achieve 
picosecond resolution was based either on the analogue 
time stretching principle or on the time-to-amplitude 
conversion which was followed by an analogue-to-
digital converter [3]. In order to achieve resolutions to 
as low as 1ps, these methods were combined with the 
commonly used interpolation method, first presented 
by Nutt [4]. FPGA-based TDCs have also been 
presented with good results. The first to use an FPGA 
was Kalisz in 1997 [5], where a resolution of 200ps 
was achieved using a pASIC FPGA from Quicklogic; a 
recent attempt from Song et al [6] presented a design 
which uses the dedicated carry lines to achieve 
resolutions times down to 65ps using a Xilinx Virtex-II 
FPGA.  
In this paper it is shown how a high resolution flash 
TDC can be implemented. The main advantage offered 
by this method over the conventional methods is that it 
does not rely upon delay lines to quantize a step 
interval and thus silicon area can be saved. Moreover, 
the resolution times achieved can be further improved 
by adding more stages to the TDC even when the time 
variability is greater than the time step between stages. 
The TDC presented is ideal for measuring clock jitter 
and characterizing deep metastability which can be 
used in estimating accurately the reliability of 
synchronizers, because of the fast conversion time and 
the high resolution offered. Section 2 presents the 
necessary background information for understanding 
flash TDCs and the reason for migrating into a new 
measurement technique. Section 3 describes an on-chip 
implementation of the TDC using asymmetric 
MUTEXes with consistently increasing time offset.  
The theory behind the proposed TDC is discussed in 
Section 4, where the effects of process variability on 
the circuit operation are presented. As a proof of 
concept the proposed TDC has been implemented in an 
FPGA, the implementation of a 32-level flash TDC in 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
a VirtexII FPGA is described in Section 5. The results 
obtained showed that the flash TDC can be used in 
FPGA applications as well as on-chip. The calibration 
of the TDC is subsequently described in Section 6 with 
the results presented in Section 7. The Conclusions and 
Future work are presented in Section 8. 
 
2. Time-to-Digital Converters Background 
 
Flash Time-to-Digital converters are analogous to the 
flash Analogue-to-Digital converters for voltage 
amplitude encoding. Both designs operate by 
comparing an input signal to various reference edges 
all displaced equally in time. The elements used to 
compare the input signal to the reference are either 
Flip-Flops or MUTEXes (circuits that decide which 
signal comes first). Several different configuration of 
the flash TDC are outlined below 
 
2.1 Single Delay chain TDC 
 
The single delay flash TDC, shown in Figure 1, 
operates by comparing the stop signal to the reference 
signal (Start) [7]; each delay element produces a delay 
equal to τ. A DLL (Delay Locked Loop) is often 
employed to ensure that the value of τ is known 
accurately. 
The operation of the single delay chain TDC, in 
measuring the time difference ∆Τ of the rising edges of 
signals Start and Stop, is shown in the timing diagram 
in Figure 2. Each Flip-Flop compares the displacement 
in time of the Start signal to that of the Stop.  
 
 
Figure 1 Single Delay Chain Flash TDC [7] 
The thermometer code produced will give an 
approximation of the time difference of the two 
signals. For instance in the example of Figure 2, the 
stop signal is caught by the Start signal on the Flip-
Flop 6, the time difference can be approximated to be 
5 6τ τ τ≤ ∆ ≤ . The only limitation of this 
implementation is that the resolution is equal to a 
single gate delay.  
Figure 2 Timing diagram of Single Delay Chain 
TDC 
 
2.2 Vernier Delay (VDL)  
 
To achieve resolutions better than one gate delay, the 
Time-to-Digital converter can be constructed with a 
Vernier Delay line as shown in Figure 3.  
 
 
Figure 3 Vernier Delay (VDL) TDC [8] 
The delay for the start signal is greater of that of the 
stop signal. The difference of the two delay lines gives 
the resolution of the TDC, as described by Dudek et al 
[8]. When we used this method to design a TDC in an 
FPGA, it was found that the resolution was no better 
than 200ps, since the Look Up Tables (LUTs) used had 
long propagation delays. Also the routing and 
placement of the design played a significant role. In 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
FPGAs the routing cannot be easily controlled; as a 
result the resolution can be significantly decreased. 
 
2.3 Process variation based TDC 
 
A flash time-to-digital converter can also be 
constructed utilizing the inherent process variations of 
the elements used in the measurement circuit to 
quantize a step interval, thus reducing the silicon area 
overhead. The resulting TDC circuit is shown in Figure 
4.   
 
Figure 4  Process variation based TDC[9] 
A flash TDC that exploits the random offsets of Flip-
Flops or arbiters to perform time quantisation has been 
presented by Levine and Roberts [9], demonstrating 
that resolutions of a few picoseconds can be achieved, 
the TDC was implemented in a 0.18µm CMOS 
process. This method relies on Flip-Flops with 
transistor mismatch to produce variation in time as 
compared with the Flip-Flops used in the measurement 
techniques described earlier assumed to be ideal. 
However this means that each TDC must be 
individually calibrated. Moreover, the calibration 
technique based on additive temporal noise requires the 
use of external test equipment to find the Flip-Flop 
offset and can be difficult to implement. 
 
3. Asymmetrical MUTEX-based TDC 
 
An alternative time delay method can be implemented 
in an on chip custom designed environment. Basically 
the input time interval (in the form of two rising edges) 
is applied simultaneously to an array of MUTEXes 
which have a progressive built-in offset, as shown in 
Figure 5. The input signals need to be buffered because 
of the large input capacitance to the MUTEXes. The 
outputs from the MUTEXes will settle in one state 
until the input time interval matches the offset and 
thereafter will settle in the other state. The result is a 
“thermometer” output code with a resolution of the 
time offset differences between successive MUTEXes. 
Of course the very fact that MUTEXes are used means 
that the closer the time interval is to the offset, the 
longer the resolution time (onset of metastability). This 
would be a limiting factor for bursts of events. 
 
 
Figure 5 Converter block Diagram 
When a MUTEX is presented with two rising edges 
which have a very short time interval separating them, 
the time taken for the MUTEX to resolve which came 
first is given by 
)ln( offsetres tt +∆= τ  
Here, ∆  is the time interval between input rising 
edges. For a symmetrically constructed MUTEX, 
offsett  should be zero. The resolution/settling time of a 
symmetrical or balanced latch has been addressed by 
several authors, [10] and [11]. However, if there is 
asymmetry which results in the offset being non zero, 
then it is possible for the MUTEX to resolve in favour 
of the signal with the later edge. 
If signal input A is in advance of signal B when the 
signals are applied to a symmetrical MUTEX, the 
MUTEX would resolve in favour of A. However, when 
the same signals are submitted to an asymmetrical 
MUTEX which favours B the MUTEX will resolve in 
favour of B until the time interval (of A in advance of 
B) is greater than the offset.  
 
3.1 Simulations Results 
 
Simulations of the flash converter have been done 
using ORCAD 10 with 0.18 micron process models. 
The effects of different rise times on the converter 
were also investigated using simulation and observed 
to have little effect.  Figure 6 shows the output 
response of the MUTEX array as the time interval 
between the square wave inputs changes.  In this 
fragment of the simulation, seven MUTEXes are 
shown to set but when the next (shorter) input interval 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
arrives, only six MUTEXes set. The calibration curve, 
shown in Figure 7, for the TDC was generated from the 
complete set of simulation results. From this it can be 
observed that the resolution of the TDC, which 
comprises 16 MUTEXes, is approximately 10 
picoseconds  
 
M
ut
ex
O
ut
pu
ts
 a
nd
 S
qu
ar
e 
w
av
e 
In
pu
ts
, 0
.2
V
/d
iv
Time, 100ps/div
Mutex Output about to 
disappear
 
Figure 6 Segment of simulation plot of Flash 
TDC. 
-15
-10
-5
0
5
10
15
-150 -100 -50 0 50 100 150
Input Interval in, ps
C
on
ve
rte
r "
th
er
m
om
et
er
 
ou
tp
ut
TDC out
Slope
 
Figure 7 TDC output versus time Interval 
 
3.2 Simulating the Effects of Process Variation. 
 
Although section 4 outlines the theory of how high 
resolution measurements can be achieved in the 
presence of process variations, a preliminary empirical 
investigation of the effect was undertaken by altering 
the dimensions of the transistors in the MUTEXes used 
in the flash TDC by up to 50% and determining the 
changes in time offset by simulation. 
Figure 8 shows a typical comparison of the offset of an 
ideal MUTEX as the transistor widths are increased 
with a MUTEX in which transistor dimensions are 
randomly varied by 10%. It is observed that the effect 
of random variations is greatest when transistor width 
is least. (The large transistor widths of Figure 8 are a 
result of the large step size of 10ps used in the initial 
simulation. For smaller step size of 2ps the width 
comes down proportionately.) It is considered that by 
adopting techniques outlined in section 4, high 
resolution measurements can be achieved under these 
conditions. 
0
20
40
60
80
100
0 2 4 6 8 10 12
Transistor Width in microns. 
O
ff
se
t i
n 
pi
co
se
co
nd
s
ideal
10 % variation
 
Figure 8 Transistor width versus ideal offset 
and process- modified offset 
 
4. Effects of Process Variability 
 
To investigate the effects of process variability in a 
MUTEX circuit, PSPICE was used with the 0.18 µ 
TSMC process model.  In the MUTEX circuit shown 
in Figure 9 we found the time differences required 
between the signal and the reference waveform to 
bring the circuit back into balance for a 10% variation 
in width or length in each transistor. 
By assuming that the total change in time difference 
for the circuit is the same as the sum of all the 
differences due to each individual change, we could 
find the effect of any random change in widths and 
lengths.  The width and length parameters of every 
transistor were then varied by a random amount, with a 
standard deviation of 10%.  
 
Reference first 
Signal first 
Reference 
Signal 
B
A
B
A
 Figure 9 MUTEX time measurement 
The RMS value of 2000 random variations of the 
circuit was then calculated to show that the distribution 
of the offset was normal with a standard deviation in a 
typical MUTEX of 2.028ps.  Thus even if  there are 
many MUTEXes closely spaced in time, the error in 
time measurement for any one MUTEX is likely to be 
around 2ps because of the random variation in its 
offset. 
Because of this variation each MUTEX output in a 
TDC may not change state exactly at the time required, 
but because the distribution of errors is normal, the 
probability of it changing at any particular time can be 
calculated from the cumulative error function with a 
deviation of 2ps. 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
 
MUTEX 0 1 2 3 4 5 6 7 8 9 
Offset, ps -4.5 -3.5 -2.5 -1.5 -0.5 +0.5 1.5 2.5 3.5 4.5 
Probability 
of a high 
output, % 
98.8 96.0 89.4 77.3 59.9 40.1 22.7 10.6 4.0 1.2 
Table 1 Probability of a high output 
Table 1 illustrates this calculation with a 10 MUTEX 
TDC in which the MUTEXes are set to change state at 
1ps intervals, i.e. with a nominal time differences 
between -4.5 and + 4.5ps. 
If there is an input with 0ps time difference, MUTEXes 
0-4 ought to be set high, and 5-9 to low, but because 
there is a random variation in the set point of each 
MUTEX, with a standard deviation of 2ps a MUTEX 
with an input of 0ps is 50% likely to be high, and one 
with -1.5ps is still only 84% likely to be high.  Table 1 
shows the probability of each one being high with 0ps 
time input varies from 98.8% if the set point is -4.5ps, 
to 1.2% if the set point is +4.5ps. 
In this 10 MUTEX example, there are 210 possible 
output patterns and the probability of each pattern 
depends on the timing of the inputs to the TDC.  We 
can compute the probability of each pattern for the 
input time difference of 0ps by taking the probability 
of the each output being the same as the corresponding 
bit in the required pattern and finding the product of 
these 10 bit level probabilities.  In this example, the 
probability of getting 1111100000 is about 15.4%, and 
the probability of 1111010000 is much less at 6.9%.  
Adding up all the probabilities for exactly five high 
outputs gives a total probability of getting exactly five 
high outputs in any of the 10 MUTEXes, and in this 
case it is 37.4%. 
A table of the probabilities for each number of outputs 
from 0 to 10 computed in this way is given in Table 2 
 
Number 0 1 2 3 4 5 6 7 8 9 10 
 
Probability 
of number 
of highs, % 
 
 
0.00 
 
 
0.02 
 
 
0.65 
 
 
6.41 
 
 
24.20 
 
 
37.43 
 
 
24.20 
 
 
6.41 
 
 
0.65
 
 
0.02
 
 
0.00
 
Table 2 Probability of a given number of high 
outputs 
 
Table 2 shows that 37.43% of all TDCs will give 
exactly five highs for a 0ps  24.2%  will give four 
highs and 24.2% will give six highs, so for 24.2% + 
37.43% + 24.2% = 86% of time measurements for a 
0ps input will give a count of 4, 5, or 6.  The number 
of high outputs here is within one of correct value for 
86% of TDCs.  Since a change in the count of one is 
equivalent to a 1ps change in the input, we can use 
these statistics to get the standard deviation of the error 
when we use the number of highs as measure of the 
time.  Here this deviation is 1.1ps rather than 2.03ps. 
MUTEXes beyond +4.5ps or below -3.5ps, make no 
real contribution to the measurement, and those that 
make the most significant contribution lie between 
2.03ps on either side of the input value. 
It can be said that with a spacing of 1 ps, there are 
typically 4 MUTEXes contributing to the measurement 
because they are within the standard deviation of the 
set points (2ps + 2ps)/1ps = 4, and the effective 
accuracy is improved by 4 , from 2ps to 1ps.   In 
general, if there is a random variation in the offset with 
a standard deviation of σ, then the standard deviation 
of the measurement error due to this variation will be 
approximately 
s
σ
σ
2/  , or σ..5.0 s  where s is 
the time step between successive MUTEXes. 
This shows that with sufficient MUTEXes, accuracies 
of much better than σ can be obtained by reducing the 
time step so that each measurement is the result of 
many individual MUTEX outputs.  Unfortunately this 
may require a large number of MUTEXes, and the 
averaging effect is reduced at the extreme ends of the 
scale, as shown in Figure 10. 
 
1 1 1 1 0 1 0 1 0 0 0 0 0 0 1 1 
0 0 0 0 0 0 0 0 0 0 0 0 0 00 1
0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 00 1
Uncertainty 
Uncertainty  
Figure 10 Effect of uncertainty at the ends of 
the scale 
If an input time difference is in the middle of the scale, 
as in the 16 MUTEX TDC output shown in the top of 
Figure 10, the 4 bit uncertainty due to variation in 
individual MUTEX set points will be fully contained 
within the 16 bit output, and there will be 8 high 
outputs and 8 low, giving the correct value.  If it is at 
the extreme end, as in the register below, it is not fully 
contained within 16 bits, and what should be a zero 
output actually has a single 1 in the centre register 
where only 16 bits are available.  This can be 
overcome by adding 2 extra MUTEXes at each end of 
the scale, as shown in the bottom register, so that the 
number of high outputs varies linearly between 2 to 18 
rather than 0 to 16.  The number of extra bits that need 
to be added at each end is given by σ/s, and 
considerably increases the cost of the TDC as more 
accuracy is sought. 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5
MUTEX time step, ps
N
oi
se
, p
s
64 MUTEX converter Single MUTEX
single noise Quantisation noise
Converter noise  
Figure 11 Noise vs. Time step 
The accuracy of a 64 MUTEX TDC was modeled as 
the step size is varied, and the results are shown in 
Figure 11, indicating the total noise due to quantisation 
alone,
12
s , noise due to a variation in a single 
MUTEX set point plus quantisation noise, and noise in 
a complete 64 MUTEX converter.  Both theoretical 
lines and points resulting form simulation are shown.  
The result of the end effects on the number of 
MUTEXes is shown in Figure 12.                                                                                             
As the step size reduces, the number of MUTEXes 
increases.  At 0.2ps, the number is 93, giving a noise 
figure of 0.46 ps and a range of 12.8ps.  Below this 
point the hardware requirements increase rapidly 
without a corresponding increase in accuracy.  At a 
step size of 3ps only 2 extra MUTEXes are required, 
the noise is around 1.8ps and the range is 192ps.  
Above that the major noise contribution is from 
quantization, so the technique is indistinguishable from 
conventional methods.  Step sizes between σ and σ/4 
give an improved accuracy with little extra cost. 
 
0
50
100
150
200
250
300
0 1 2 3
Step, ps
N
um
be
r o
f M
U
TE
Xs
0
50
100
150
200
R
an
ge
, p
s
No MUTEXs Range, ps  
Figure 12 Range and Cost of a 64 output TDC 
5. FPGA-based TDC Implementation 
 
To demonstrate the potential effectiveness of the 
approach, a flash Time-to-Digital converter was 
implemented as a proof of concept in a Xilinx VirtexII 
1000 FPGA, manufactured in a 0.35 CMOS process 
and features one million equivalent gates. The TDC 
was implemented using the software supplied by the 
FPGA vendor, no additional software was used to 
improve the performance or the routing of the design, 
since the primary aim was to construct a TDC that can 
be easily implemented in any FPGA architecture. 
The main problem found when an FPGA is employed 
for the implementation of the flash TDC, is that the 
routing of the design is managed entirely by the 
software, which can then add extra propagation delays 
significantly lowering the resolution of the TDC. 
However from the experiments carried out, it was 
found that the elements in the Logic Blocks of the 
FPGA, such as the XOR gates used in carry operations 
(XORCY) follow specific routing paths irrespective of 
the localization of the design, unlike Look up Tables 
(LUTs) where the placement of the delay elements is 
entirely up to the software. This way it was possible to 
accurately predict and calculate the interconnect 
delays. These elements were then used to drive the 
Stop and the Start signal of the flash TDC. It is 
important to note that the entire placement has been 
done manually and all the optimization offered by the 
software used was turned off, since each component 
was playing a significant role and any optimization 
could affect the performance of the TDC. 
All of the measurements techniques described in 
Section 2 were implemented in the Xilinx FPGA in an 
attempt to find which one produced the best results. 
The resolution times produced by the single delay 
chain and the Vernier delay methods were found to be 
unacceptable for the use of the TDC in the 
measurement of jitter and metastability. Also, because 
of the delay elements used, the implementation of these 
measurement techniques can be different in other 
FPGA architectures and as a result the resolution of the 
TDC may vary and inconsistent results will be 
produced. As a result, a measurement technique that is 
independent of the FPGA architecture having high 
resolution and is easy to implement is desired. A 
similar method with the one described in Section 2.3 is 
used. However, unlike the measurement technique 
presented by Levine and Roberts [9], the flash TDC 
proposed in this paper have a constantly increasing 
time difference between start and stop signal to define 
the time, which is achieved by using the different 
delays produced by the Xilinx software for the data 
(stop) and clock (start) paths.   By using the XORCY 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
gates it was possible to achieve an almost identical 
routing for both data and clock paths, unlike using a 
clock buffer which splits the design into several 
clocking areas and makes it impossible to achieve a 
constantly increasing time difference between the two 
signals. However, the total data path interconnect delay 
tends to be longer than the clock path, which it is 
believe to be due to loading effects on the Flip-Flops 
used. As a result the difference in delay is due to the 
difference between the general purpose routing and the 
dedicated clock network Both clock and the data paths 
are laid out progressively from Flip-Flop zero, then 
adding a link to Flip-Flop one, a further link to Flip-
Flop two and so on until Flip-Flop thirty one. Random 
variation in the routing paths, caused by the routing of 
the design, can create discontinuities in the 
measurements (“Bubbles”). These errors can be 
reduced by averaging the random variation presented. 
Another characteristic of the proposed TDC is the 
straight forward calibration method, since it does not 
depend upon the process variability and added noise to 
define a time step and no external equipment is 
necessary.  
Taken into account the variability effects as described 
in Section 4, it was decided that a 32-level flash TDC 
will satisfy our requirements of a conversion time in 
the order of 2ns. The 32-level flash TDC, shown in 
Figure 13, uses Flip-Flops to compare the signal 
phases, which were placed in different CLBs in order 
to achieve maximum uniformity on the interconnects 
delays for each stage. In addition two XORCY gates 
are used to drive the Stop and Start signal. Each level 
of the TDC is connected to a counter, to total the 
number of times each level produced a logic 1, which 
is later used to calculate the step size of each level.  
A reference counter is also used to count the clock 
cycles that the TDC is operational. When the reference 
counter reaches a predefined value, in this case one 
million clock cycles or 220, a signal will then go high, 
reset the Flip-Flops and stop the count at each level of 
the flash TDC. The value of each counter is then 
extracted individually using a multiplexer, which is not 
shown on Figure 13 for simplicity, the values are then 
displayed in the 7-segment display of the FPGA board. 
Since there is the probability that some of the Flip-
Flops may go into a metastable state and hence 
produce inaccurate results, sufficient time was given to 
the Flip-Flops to resolve before their values were 
sampled. The value of τ (resolution time constant) for 
the Flip-Flops is of the order of 50ps [12]. In an on-
chip implementation MUTEXes will be used instead of 
Flip-Flops to compare the signal phases, as described 
in Section 3, since they have better metastable 
response. 
 
 
Figure 13 Flash TDC implementation 
 
6. Calibration 
 
To calculate the operational time interval of the 
proposed TDC, it was desired to know the number of 
times all the Flip-Flops were high, the number all Flip-
Flops were low, the number of time where at least one 
Flip-Flop was high and the number of time where  at 
least one Flip-Flop was low. To extract all these 
different parameters four additional counters were 
added to the overall implementation, that way extra 
precision was added to the measurements taken. The 
area taken by the whole design including additional 
hardware is shown in Figure 14, which was generated 
by the PACE software which is part of the Xilinx ISE. 
         
MUX
Counters
TDC
 
Figure 14 FPGA placement of the flash TDC 
To calibrate the TDC, two asynchronous oscillators 
were used to drive the Start and the Stop signals. The 
clock and the data frequencies were set to 10.5 MHz 
and 10.3 MHz respectively. Since the two oscillators 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
are not locked together all overlap times for data and 
clock will be generated with equal probability, 
providing a realistic situation where the signals 
measured can change phase at any time. 
The time interval of most interest is when the data 
signal is changing from low to high, as shown in 
Figure 15, it is useful to note that each of the Flip-
Flops is set either to a logic 1 or a logic 0 on each 
rising clock edge, depending on the relative delays of 
the clock and data paths at each Flip-Flop input.  
Consequently, the TDC had to be inhibited from 
counting the change of the data from high to low, 
which can produce inaccurate results. This was 
achieved by gating all the counters at each level and 
the additional counters for all high Flip-Flops, all low 
Flip-Flops, at least one Flip-Flop high and at least one 
Flip-Flop low. The gating was implemented using an 
extra Flip-Flop in the TDC. The extra Flip-Flop 
featured a delay in front of the data input and the 
output was connected to an inverter (or the Q¯  output if 
available). The data signal was approximately delayed 
by 23ns, taking into account the routing and the extra 
hardware used. The output of the inverter was then 
connected to an AND gate, as shown in Figure 16, the 
output of which was then used to gate the counters and 
thus inhibiting the change of the data input from high 
to low to be measured. 
 
 
Figure 15 Time interval measured 
 
 
Figure 16 Added delay stage on the data path 
7. Results 
As previously described the TDC was run for given 
period of time as indicated by a reference counter 
which in this instance was set to 220 clock cycles, 
whereupon all counts were made static and extracted 
individually via a multiplexer, the results are shown in 
Table 3. To determine the resolution and thus the time 
range of the TDC some statistical processing of the 
values was necessary.  
To find the time range of the TDC, it is first necessary 
to find the number of recorded events with at least one 
Flip-Flop high and at least one Flip-Flop low. The 
recorded events with at least one Flip-flop high can be 
found by: 
               20182__ =− acountccount  
The same can be done to find the recorded events with 
at least one zero: 
              20950__ =− bcountdcount  
 
Counters Values Time/ps Expected 
time/ps 
% Error
Count_0 20290 1878 1797 4.53% 
Count_1 20182 1868 1735 7.69% 
Count_2 18072 1673 1673 0.00% 
Count_3 18322 1696 1611 5.28% 
Count_4 14194 1314 1549 -15.18% 
Count_5 16610 1537 1487 3.40% 
Count_6 16362 1514 1425 6.28% 
Count_7 16995 1573 1363 15.41% 
Count_8 12707 1176 1301 -9.60% 
Count_9 12445 1152 1239 -7.03% 
Count_10 10900 1009 1177 -14.29% 
Count_11 10540 975 1115 -12.52% 
Count_12 10655 986 1053 -6.36% 
Count_13 10625 983 991 -0.79% 
Count_14 8999 833 929 -10.37% 
Count_15 8437 781 867 -9.96% 
Count_16 7352 680 805 -15.51% 
Count_17 9102 842 743 13.32% 
Count_18 8854 819 681 20.26% 
Count_19 8539 790 619 27.57% 
Count_20 5400 499 557 -10.36% 
Count_21 5006 463 495 -6.51% 
Count_22 3628 335 433 -22.57% 
Count_23 3178 294 371 -20.87% 
Count_24 3282 303 309 -1.93% 
Count_25 3445 318 247 28.67% 
Count_26 1592 147 185 -20.72% 
Count_27 0 0 123 -100.00% 
Count_28 53 4.9 61 -92.08% 
Count_29 928 85 0  
Count_30 957 88 -61 -242.98% 
Count_31 131 12 -123 -109.79% 
All FFs one (count_a) 39777    
All FFs zero (count_b) 303259    
At least one FF 1( count_c) 59959    
at least one FF zero (count_d) 324209    
Table 3 Extracted values for each counter 
These two values are expected to be roughly the same. 
The mean value of the recorded events for at least one 
Flip-flop high and at least one Flip-flop low is equal to 
20556, which is the number expected to be seen in 
count0. In this case the number of the recorded events 
is slightly greater than the number of counts in count0 
= 20290, the error margin at this instance of 270 counts 
can be assumed negligible. The full range of time from 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
one Flip-Flop low and 31 Flip-Flop high to 31 Flip-
Flop low and a single Flip-Flop high can then be found 
by taking the value of the counter of the first stage 
multiplied by the period of the cycle time (97.087ns), 
since this is the proportion of the cycle for which there 
is a mix of logic 1s and logic 0s in the Flip-Flops, and 
divide by the number of clock cycles indicated by the 
reference counter, in this case 220, hence 
nsnsrangeTime 88.1
2
08.9720290_ 20 =
×
=     
This suggests a step size between successive Flip-flops 
of approximately 58ps. However to avoid the end 
effect as described in Section 4 only 27 stages (2 to 29) 
are going to be used, as a result the actual time range is 
at the order of 1.64ns as suggested by count2, with a 
step size of 61.9ps between changes in number of Flip-
Flops that produce a logic 1. 
As observed in the Table 3, there is discontinuity in the 
measurements; one method of improving this problem 
is to put the values of each counter (0 to 32) in 
ascending order of magnitude, as shown in Table 4 .  
                          
Counters Values Time/ps Expected 
time/ps 
 % Error
count0 20290 1878 1822 3.10% 
count1 20182 1868 1759 6.22% 
count3 18322 1696 1696 0.00% 
count2 18072 1673 1633 2.43% 
count7 16995 1573 1570 0.18% 
count5 16610 1537 1507 1.99% 
count6 16362 1514 1445 4.83% 
count4 14194 1314 1382 -4.92% 
count8 12707 1176 1319 -10.83% 
count9 12445 1152 1256 -8.30% 
count10 10900 1009 1193 -15.46% 
count12 10655 986 1130 -12.77% 
count13 10625 983 1068 -7.90% 
count11 10540 975 1005 -2.92% 
count17 9102 842 942 -10.58% 
count14 8999 833 879 -5.28% 
count18 8854 819 816 0.37% 
count19 8539 790 753 4.86% 
count15 8437 781 691 13.03% 
count16 7352 680 628 8.34% 
count20 5400 499 565 -11.58% 
count21 5006 463 502 -7.79% 
count22 3628 335 439 -23.62% 
count25 3445 318 376 -15.39% 
count24 3282 303 314 -3.27% 
count23 3178 294 251 17.08% 
count26 1592 147 188 -21.80% 
count30 957 88 125 -29.49% 
count29 928 85 62. 36.75% 
count31 131 12 0  
count28 53 4 -62 -107.81% 
count27 0 0 -125.6 -100.00% 
Table 4 Values for each counter after ordering 
After ordering the counter values of each stage the 
previous calculations can be repeated. The step size at 
this instance is equal to 62.8ps as suggested by count3. 
The time range for each stage is then plotted against 
the number of high Flip-Flops, as shown in Figure 17. 
The accuracy of the TDC can then be calculated, using 
the equation for error measurement described in 
Section 4, for ‘before’ and ‘after’ ordering, in this case 
before ordering the standard deviation was 109.1 and 
after 69.3, which even though it is not a reduction in 
noise of a factor of two, as the theory suggested, it can 
be consider to be very close to the targeted value 
taking into account that the routing of the design is 
random. 
Higher resolutions can be achieved by adding more 
stages in the TDC implementation, which has as a 
tradeoff the increase of silicon area used. The 
conversion time can also be improved if the design is 
implemented in an FPGA manufactured in a more 
advance CMOS process, or with the use of Planahead, 
an additional software package offered by Xilinx, 
which can control, to some extent, the routing of the 
design. 
 
0
200
400
600
800
1000
1200
1400
1600
1800
0 5 10 15 20 25 30
Number of high Flip-Flops
Ti
m
e,
 p
s
 
Figure 17 Plotted values for each counter after 
ordering 
 
8. Future Work and Conclusions 
 
The flash TDC described in this paper offers the 
possibilities of measuring time intervals of the order of 
few picoseconds. The method utilizes the inherent 
process variations of the elements used in the 
measurement circuit to quantize a step interval, instead 
of explicit delay blocks, thus reducing the silicon area 
overhead. The results from the implementation of the 
flash TDC using asymmetric MUTEXes suggested 
resolution times down to few picoseconds. The paper 
takes account the end effect in order to maintain 
accuracy near the limits of the TDC. The proposed 
design was implemented as a proof of concept in a 
Xilinx FPGA. The resolution time achieved before any 
modifications were made to improve the routing of the 
design was approximately 190 picoseconds, there after 
the XORCY gates were used to drive the Start and 
Stop signal the resolution improved to 62 picoseconds; 
however we can improve on this value by increasing 
the number of stages on the TDC. A method that 
overcomes the problems of discontinuity in the 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
                                                                                              
measurements was also presented, where the values of 
each stage were put into order of magnitude and 
showed that the standard deviation of the accuracy of 
the TDC was improved from 109.1 before ordering to 
69.3, and showed improvements of a factor of 1.57. 
Moreover the calibration technique presented in this 
paper is straight forward and does not depend upon the 
process variability to define a time step and no external 
equipment is necessary.  An additional advantage of 
this method is that it can be used with any FPGA 
architecture, without any modifications in the 
hardware. The main applications to benefit from the 
flash TDC will be in on-chip testing of synchronizer 
reliability and clock jitter.  The technique is cost 
effective because no external test equipment is 
necessary and offers fast conversion times and high 
resolutions.  
Regarding future work, a silicon chip implementation 
of the asymmetric MUTEX TDC will be undertaken. 
In the meantime, the FPGA-based TDC will be tested 
in other FPGA architectures to validate that the design 
is architecture independent.  
 
9. Acknowledgements: 
 
The authors would like to thank Alex Yakovlev for the 
helpful discussions; the anonymous reviewers and 
funding from the UK EPSRC research grant 
EP/C007298/1 without which this work would have 
not been possible. 
 
10 References 
 
[1]   M. A. Abas, G. Russell, D. J. Kinniment, “Design 
of sub-10-picoseconds on-chip time measurement 
circuit”, Design, Automation and Test in Europe 
Conference and Exhibition. IEEE Computer Society, 
Vol.2, 2004  
[2 ] D.J.Kinniment, K. Heron, G.Russell, “ Measuring 
Deep Metastability,” 12th IEEE International 
Symposium on Asynchronous Circuits and Systems, pp. 
2-11, March 2006. 
[3]  E. Raisanen-Ruotsalainen, T. Rahkonen, J. 
Kostamovaara, “Time interval measurements using 
time-to-voltage conversion with built-in dual-slope 
A/D conversion,” in Proc. 1991 International 
Symposium on Circuits and Systems (ISCAS’91), vol. 
5, pp. 2573-2576, Singapore, 1991. 
[4] R. Nutt, “Digital time interval meter,” 
Rev.Sci.Instrum, Vol. 39, pp. 1342-1345, 1968. 
[5]  J. Kalisz, R. Szplet, R. Pelka, A. Poniecki, “ 
Single-Chip interpolating time counter with 200-ps 
resolution and 43-s range,”IEEE Transaction on 
 
 
Instrumentation and Measurements, vol. 46, pp. 851-
856, 1997. 
[6] J. Song, Q. An, S. Liu, “A High-Resolution Time-
to-Digital Converter implemented in Field-
Programmable-Gate-Arrays,” IEEE Transactions of 
nuclear Science, vol. 53-1, pp. 236-241, 2006. 
[7] C. T. Gray, W. Liu, W. A. M. Van Noije, T. A. 
Hughes, Jr., and R. K. Gavin, “ A Sampling technique 
and its CMOS implementation with 1 Gb/s bandwidth 
and 25ps resolution,” IEEE Journal of Solid-State 
Circuits, vol. 29, pp. 340-349, Mar. 1994. 
[8] P.Dudek, S. Szczepanski, j. Hatfield, “A high-
resolution CMOS time-to-digital converter utilizing a 
Vernier delay line,” IEEE Journal of Solid-State 
Circuits, vol. 35, pp. 240-247, Feb 2000. 
[9] P. M. Levine, G. W. Roberts, “A High Resolution 
Flash Time-to-Digital Converter and Calibration for 
System-on-Chip Testing”, IEE Proceeding- Computers 
and  Digital Techniques, Vol. 152, No. 3, pp. 415-426, 
May 2005 
[10]  Mead, C.A. and L.A. Conway, Introduction to 
VLSI Systems. 1980: Addison Wesley. 396. 
[11] Kang, S.-M. and Y. Leblebici, CMOS Digital 
Integrated Circuits, Analysis and Design. 2 ed. 1999: 
WCB/McGraw-Hill. 
[12] N.Minas, D.J.Kinniment, G.Russell, A.Yakovlev, 
“Metastability in FPGA devices,” 18th UK 
Asynchronous Forum, September 2006. 
 
 
 
13th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'07)
0-7695-2771-X/07 $20.00  © 2007
Authorized licensed use limited to: Newcastle University. Downloaded on June 08,2010 at 14:32:37 UTC from IEEE Xplore.  Restrictions apply. 
