Towards implementing multi-channels, ring-oscillator-based, Vernier
  time-to-digital converter in FPGAs: key design points and construction method by Cui, Ke et al.
1Towards Implementing Multi-Channels,
Ring-Oscillator-Based, Vernier Time-to-Digital
Converter in FPGAs: Key Design Points and
Construction Method
Ke Cui, Xiangyu Li, Zongkai Liu and Rihong Zhu
Abstract—For TOF positron emission tomography (TOF PET)
detectors, time-to-digital converters (TDCs) are essential to re-
solve the coincidence time of the photon pairs. Recently, an
efficient TDC structure called ring-oscillator-based (RO-based)
Vernier TDC using carry chains was reported by our team.
The method is very promising due to its low linearity error
and low resource cost. However, the implementation complexity
is rather high especially when moving to multi-channels TDC
designs, since this method calls for a manual intervention to
the initial fitting results of the compilation software. In this
paper, we elaborate the key points toward implementing high
performance multi-channels TDCs of this kind while keeping
the least implementation complexity. Furthermore, we propose
an efficient fine time interpolator construction method called
the period difference recording which only needs at most 31
adjustment trials to obtain a targeted TDC resolution. To validate
the techniques proposed in this paper, we built a 32-channels TDC
on a Stratix III FPGA chip and fully evaluated its performance.
Code density tests show that the obtained resolution results lie in
the range of (23 ps ~ 37 ps), the differential nonlinearity (DNL)
results lie in the range of (-0.4 LSB ~ 0.4 LSB) and the integral
nonlinearity (INL) results lie in the range of (-0.7 LSB ~ 0.7 LSB)
for each of the 32 TDC channels. This paper greatly eases the
designing difficulty of the carry chain RO-based TDCs and can
significantly propel their development in practical use.
Index Terms—time-to-digital converter, field programmable
gate array, ring oscillator, carry chain, Vernier delay line, period
difference recording
I. INTRODUCTION
T IME-of-flight (TOF) PET consists of very fast detec-tors which utilize multi-channel time-to-digital converter
(TDC) modules to resolve the coincidence time of the photon
pairs. This helps to improve the tomographic reconstruction
quality while simultaneously reducing radiation doses and/or
scan times [1]. The performance of the overall PET system is
directly related to the precision of the used TDCs. In this
paper, we will focus on the designing of highly accurate
This work was supported by the Fundamental Research Funds for the
Central Universities under Grants 30916014112-019 and 30916011349.
Ke Cui, Zongkai Liu and Rihong Zhu are are with the MIIT Key Laboratory
of Advanced Solid Laser, Nanjing University of Science and Technology, Nan-
jing, Jiangsu, China, and also with the Advanced Launching Co-innovation
Center, Nanjing University of Science and Technology, Nanjing, Jiangsu,
China (e-mail: njustcuik@njust.edu.cn).
Xiangyu Li is with the School of Computer Science and Engineering,
Nanjing University of Science and Technology, Nanjing, Jiangsu, China.
multi-channel TDCs while keeping as least implementation
complexity as possible.
Most present TDC structures adopt a two-step time mea-
surement technique [2]-[5]. In this method, the first step
uses a coarse counter running at system clock rate (usually
corresponding to a period of several nanoseconds) to record
the elapsed coarse time to guarantee large dynamic range. The
second step adopts a fine time interpolator with subnanosecond
resolution to accurately record the time bin locating in the
specific system clock cycle at which the coarse time counter
is latched to guarantee high precision. The mostly used fine
time interpolator techniques include: tapped delay line (TDL)
[3]-[12], pulse shrinking delay line [13] and Vernier delay
line [14]-[16]. Generally, there are two platforms to imple-
ment TDCs: application specific integrated circuits (ASICs)
and field programmable gate arrays (FPGAs). ASIC-based
TDCs have strong design flexibility and can utilize some very
beneficial analog circuits such as delay locked loops (DLLs)
contributing to an excellent delay line. However, their cost is
especially high when the production volume is low and the
development period is rather long. FPGA-based TDCs have
much less design freedom which are constrained in the digital
design space. However, the reconfigurability of FPGAs makes
the design much less expensive and can be adjusted to meet
new requirements quickly.
J. Wu proposed the carry chain based structure as an effi-
cient TDL interpolator which is important to the development
of the FPGA-based TDCs [7]. The carry chains which widely
exist in modern FPGAs are specially provided by their vendors
to fulfill fast algorithm functions such as fast addition or
comparison. The delay time of a basic carry chain cell is
very small such that it is reasonably conceived as the most
ideal tool to fulfill the fine time interpolation task on FPGA
chips. That is the reason why the carry chain based TDL
TDCs have gained extensive studies in recent years. However,
existence of ultra-wide bin which is physically determined
during chip fabrication process limits the precision of such
TDCs significantly. J. Wu proposed the wave-union A and B
methods to effectively subdivide the ultra-wide bin by multiple
measurements along a single carry chain and improved the
precision beyond its cell delay [6]. The wave-union methods
are then widely adopted by many later emerging FPGA-based
TDC designs [8], [11]. Another drawback of such TDCs is the
large differential nonlinearity (DNL) and integral nonlinearity
ar
X
iv
:1
70
3.
01
08
2v
2 
 [p
hy
sic
s.i
ns
-d
et]
  9
 Ju
n 2
01
7
2coarse counter
clk
cnt coarse cnt
fine cnt
system clk
hit signal
clk_in clk_syn
hit_in hit_syn
time 
stamp 
result
ctrl_1
ctrl_2
clock extraction fine time interpolator
cnt
clk_syn
hit_syn
clear
time 
assembler
clk
clk
coarse cnt
fine cnt
ctrl_2 ctrl_1 clear
(a)
clear
(b)
D Q
QCLK
CLR
SET
q DUs
p DUs
cascaded 
carry chain
hit_syn
clk_syn
clear
pulse width 
reshaping
0
1
0
1
c
o
u
n
t
e
r
fine
time  
counter
en
clk
pulse width 
reshaping
pulse width 
reshapingpulse width 
reshaping
ctrl_2 
generatorclk 
ctrl_2
En
slow RO
fast RO
clr
D Q
QCLK
CLR
SETVcc
hit_syn
(clk_syn)
posT
dff
output
m-2 m-1 m m+1
n (the valid fine cnt)
system clk
coarse cnt
hit
ctrl_1
(for coarse cnt latching)
hit_syn
clk_syn
fine cnt
ctrl_2
(for fine cnt latching)
clear
0
fine time
(c)
two clocks latency
1
Fig. 1. Carry chain RO-based TDC. (a) overall structure; (b) timing diagram; (c) structure of the RO-based fine time interpolator.
(INL) problem caused by uneven bin granularity of the used
carry chains. One possible technique to mitigate this problem
is to use the bin-by-bin calibration techniques [2]. However,
this incurs large memory and logic resource cost.
Recently, we proposed a new ring-oscillator-based (RO-
based) TDC structure by organizing the carry chains in a
Vernier loop style [17]. A specific construction method to
set up two ROs with very little period difference was fully
illustrated. This method opened up a new way to utilize the
carry chains to build the fine time interpolator which led to
much reduced DNL and INL. In this paper, we report a 32-
channels TDC realized on a single FPGA chip by further
exploiting the method proposed in [17]. One main challenge
on the implementation complexity emerges when moving to
multi-channels TDC designs, since the efforts which should
be paid linearly increase with the channel number. This is
especially true because the design flow requires an exhaustive
manual intervention to the initial fitting results of the compi-
lation software to obtain the two ROs with a targeted period
difference. This paper proposes a new construction method
for building the two ROs with definitely less than 31 trials per
TDC channel which greatly reduces the design complexity.
In summary, the main contributions of this paper contain:
key points to obtain high performance TDC by utilizing the
carry chain RO-based method; a new and highly efficient
construction method called the period difference recording
(PDR) to build the ROs; multi-channel ability and scalability
by utilizing the carry chain RO-based method in a single
FPGA chip.
II. MULTI-CHANNEL TDC DESIGN
A. Carry Chain RO-Based TDC Structure
The basic carry chain RO-based Vernier TDC structure
is depicted in Fig.1. It uses two steps to measure a time
interval including a coarse and a fine time measurement steps
(Fig.1(a)). The corresponding timing diagram is shown in
Fig.1(b). The coarse counter running at the system clock rate
is adopted to record the coarse time. The clock extraction
module is designed to find the closest clock signal in time
after the hit signal and extract the delayed hit and clock signals
pair to the fine time interpolator module to measure the fine
time interval between them. The working principle and circuit
implementation of the clock extraction module can be found
3in Section II-B. Two signals labeled ctrl_1 and ctrl_2 are
generated to denote the proper timing for the time assembler
module to latch the coarse and fine counter values correctly
and combine them together to produce the final timestamp.
Fig.1(c) shows the detailed structure of the RO-based Vernier
fine time interpolater which connects the last cell of the carry
chain back to its first cell. The two ROs are composed of
different numbers of carry chain cells (or delay units - DUs)
and hold different oscillation periods. The DU works as basic
delay unit and a complete Vernier delay line can contain an
even or odd number of DUs. The period difference between the
two ROs determines the resolution of the TDC. Additionally,
each RO contains a pulse width reshaping module to maintain
the stability of the positive duration of the oscillation signal
propagating along it. Its circuit implementation is shown in the
rightmost part of Fig.1(c). According to the timing diagram
(Fig.1(b)), the leading signal of a fine time interval event (the
hit_syn signal in Fig.1) is fed to the slow RO while the lagging
one (the clk_syn signal in Fig.1) to the fast RO. The fine
timestamp is obtained by reading out the fine time counter
which records the oscillation number at the moment that the
lagging signal catches up the leading signal. Obviously, the
DNL of such TDCs avoids the bad influence of the uneven bin
granularity, since the resolution is determined by the physical
length difference of the two ROs but not the bin widths of
the used carry chains as in the TDL form. Much reduced
DNL has been observed in [17]. However, since the ROs are
not compensated and stabilized during the time measurement,
the oscillation number cannot be set too large to assure
small precision RMS which means that tradeoff between the
resolution and the precision should be carefully considered
and made by the designers.
B. Key Design Points
There are two key points for the designers to build the RO-
based Vernier TDC: the clock extraction module and the fine
time interpolator module.
1) key design point for the clock extraction module: The
basic task for the clock extraction module is to locate the
clock signal which is nearest in time after the hit signal
and then extract it out. The outputted leading and lagging
signals are termed as hit_syn and clk_syn correspondingly
and the fine time interval between them will be measured
by the following fine time interpolator module. The circuit
implementation of the clock extraction module is depicted
in Fig.2. It can be seen that an undesired additional delay
τd = τreg + τ2 − τ1 is added to the original fine time interval
and causes a minimal oscillation number n0 = τdLSB , where τreg
represents the delay introduced by the sampling D-type flip-
flops, τ1 and τ2 represents the adjustable delay introduced by
the delay compensation units for the hit_in and clk_in signals
respectively, and LSB represents the resolution of the TDC.
The existence of n0 deteriorates the TDC performance, since
the precision RMS increases about proportionally to the square
root of the overall oscillation number [18]. The timing diagram
in Fig.2 shows that the following formula should be satisfied
to guarantee the requirement that the renewed arrival time of
D Q
QCLK
CLR
SET
D Q
QCLK
CLR
SEThit_in
clk_in
hit_syn
clk_syn
ctrl_1
hit_in
clk_in
posT
clkT
1t
2t
hit_syn
clk_syn
posT
11
*
1  tt
22
*
2   regtt
)(*1*2*1 clkpos TTttt 
reg
1
)(121 clkpos TTttt 
d clkT
2
delay compensation unit
Fig. 2. Circuit implementation and timing diagram of the clock extraction
module. Here t1 and t2 are the arrival time of the hit_in and clk_in signals
respectively, while t∗1 and t
∗
2 are the corresponding output time after passing
the clock extraction module.
hit_syn
clk_syn
*
1t
*
2t
d clkT
hit_syn
clk_syn
*
1t
*
2t
d
clkT dclkT
hit_syn
clk_syn
*
1t
*
2t
d clkT d clkT
posT posT
(a) (b)
(c) (d)
(e)
Fig. 3. Relative phase relationship between the hit_syn and clk_syn signals.
the clk_syn signal should not exceed the positive duration of
the hit_syn signal for correct operating of the sampling D-type
flip-flop in the fine time interpolator:
0 ≤ τd ≤ (Tpos − Tclk) (1)
where Tpos represents the positive duration of the hit_syn
signal and Tclk represents the period of the system clock. The
compensation delays τ1 and τ2 are intentionally added to try
to make τd as small as possible leading to the least n0 to
gain the best precision performance. The compensation unit is
composed of 32 cascaded look-up table (LUT) implemented
NOT gates. According to our experimental experience, this
gate amount is adequate to find a good enough parameter
set (τ1, τ2). The actually used gates number for each signal
is manually adjusted and determined by using the resource
editor tool provided by the FPGA manufacturers (for example
the engineering change orders - ECO tool by Altera and the
FPGA editor tool by Xilinx). The adjustment criteria is to
make τd locating in the range constrained by equation (1) and
as close to zero as possible. However, this task is difficult to
4be fulfilled directly since all the timing parameters τ1, τ2, τreg
and Tpos are very hard to be known exactly. To combat the
difficulty, we propose to infer the actual τd value by observing
the distribution of the outputted fine time counter values n
captured from the fine time interpolator module. The cases of
the relative phase between the hit_syn and clk_syn signals are
depicted in Fig.3 and the corresponding distributions of n are
summarized in Table I, where the parameter nm represents the
maximal fine time counter value.
TABLE I
DISTRIBUTIONS OF THE FINE TIME COUNTER VALUES n CORRESPONDING
TO DIFFERENT CASES IN FIG.3
case
number description distribution of n
(a)
all the occurring time points of the
clk_syn are late to the positive
duration of the hit_syn
{0}
(b)
all the occurring time points of the
clk_syn are prior to the positive
duration of the hit_syn
{0}
(c)
part of the occurring time points of
the clk_syn are late to the positive
duration of the hit_syn
{0} ∪
{(n0, nm)}
(d)
part of the occurring time points of
the clk_syn are prior to the positive
duration of the hit_syn
{(0, nm)}
(e)
all the relative phase relations are in
the expect range satisfying equation
(1)
{(n0, nm)}
According to the cases listed in Table I, the following steps
are applied to efficiently find an optimal set (τ1, τ2) for each
TDC channel:
1) According to the collected distributions of n, classify
which case the present parameter set (τ1, τ2) belongs
to. If it accords with case (c), goto step 2; if it accords
with case (d), goto step 3; if it accords with case (e),
goto step 4; otherwise modify the number of cascaded
NOT gates in either of the two delay compensation unit
until one of the cases (c)~(e) appears.
2) Iteratively shorten the gates number of the delay com-
pensation unit in the clk_syn signal path until case (e)
appears and then goto step 4.
3) Iteratively shorten the gates number of the delay com-
pensation unit in the hit_syn signal path until case (e)
appears and then goto step 4.
4) Decrease n0 by interatively shortening the gates number
of the delay compensation unit in the clk_syn signal path
until the minimal achievable positive n0 is found.
The shortened gates number in steps 2 ~ 3 is usually set larger
than 1 to boost the finding process while the number in step 4
is set just as 1 to search the optimal n0. The above mentioned
manual adjustment for the clock extraction module is very
useful, since as the targeted TDC channels number increases,
the initial fitting results have more risk to lay out of case (e)
and lead to operation failure. Even the automatic fitting results
initially accords with case (e), it is still very beneficial to apply
step 4 to find the optimal n0.
2) key design point for the fine time interpolator module:
The fine time interpolator module contains two structure-
symmetric ROs built from the carry chains. The structural
similarity is obtained by utilizing the partition based two-
step construction method proposed in [17]. The complete
oscillation period of the RO is composed of three parts: τp1
caused by the carry chain (the path encompassed by the dotted
rectangle), τp2 caused by the connection path between the end
of the carry chain and the pulse reshaping module (the bold
line), and τp3 caused by all the remaining logic units and paths
in the RO as shown in Fig.4.
It is clear that τp3 keeps constant once the RO is set
up. In our previous work, τp2 is also assumed to be un-
changed after manual intervention at the fine tuning point,
so an iterative adjustment process of assigning different DU
number combination sets to alter τp1 for the two ROs is
adopted. The adjustment task is performed as follows: cut
off the connection at the fine tuning point whose oscillation
period is longer; shorten the length of the carry chain by
one DU; finally reconnect the new shorter carry chain to the
corresponding pulse width reshaping module. The oscillation
periods are observed on an external oscilloscope by intro-
ducing the oscillation signals out of the FPGA chip. This
adjustment principle restricts that the DU number combination
set assigning direction can only be conducted forward to the
front end of the carry chain, which may incur the missing
of many potential DU number combination sets, since τp2
becomes actually uncertain after each time of adjustment.
This arises from the possibility that when a RO needs to
reduce its oscillation period, the DU number may actually
require adding 1 instead of subtracting 1, if τp2 decreases so
dramatically that the entire oscillation period decreases even
with the larger DU number. In our example design, the overall
length of a complete carry chain is 32 and this theoretically
gives 32 × 32 = 1024 possible DU number combination sets
if both τp1 and τp2 are viewed changeable. The release of the
adjustment constraint generates huge DU number combination
set space and gives flexible design freedom. This point is also
important in multi-channels TDC designs since the more DU
number combination candidates can be used, the less design
failure may be encountered when using such a short carry
chain (totally 32 DUs). If a design failure happens for a TDC
channel, the designer has to re-allocate a new physic region on
the FPGA chip and re-construct this bad channel which will
greatly increase the design complexity. Although extending the
length of the used carry chain is also feasible to improve the
design success rate, it will cause much larger resource cost
which is especially true in multi-channels TDC designs.
C. Period difference recording method for fine time interpo-
lator construction
In this section we propose the PDR method, by using
which every possible DU number combination set can be
covered with very few total adjustment trials. To clarify the
PDR method, we define the oscillation period of the fast RO
as τf, i (i = 32, 31, ..., 1), when the i-th DU number of
the fast RO is connected to the fine tuning point. Similarly
we define the oscillation period of the slow RO as τs, j
(j = 32, 31, ..., 1), when the j-th DU number of the slow
RO is connected to the fine tuning point. Additionally we
5q DUs
p DUs
pulse width 
reshaping
0
1
0
1
pulse width 
reshaping
pulse width 
reshaping
pulse width 
reshaping
hit_syn
clk_syn
clear
1 q np
1 q np
delay path 1p delay path 2p delay path 3p CH 1 CH 2
oscilloscope
output
output
fine tuning point
slow RO
fast RO
Fig. 4. Delay paths along the RO.
define the oscillation period difference between the fast and
slow ROs as ∆τi, j = τs, j − τf, i corresponding to the DU
number combination set (i, j). By using the above definitions,
we illustrate the PDR design flow as follows:
1) Test and record the result of ∆τ32, 32, and goto step 2.
2) Fix i = 32, enumerate j = 31, 30, ..., 1, test and record
the results of ∆τ32, j , and goto step 3.
3) Fix j = 32, and initialize i = 31 if this is the first time
entering this step. Test and record the result of ∆τi, 32,
goto step 4.
4) For the current targeted DU number combination set
(i, j) (j = 31, 30, ..., 1), compute ∆τi, j = ∆τ32, j +
∆τi, 32 − ∆τ32, 32. If any ∆τi, j lying in the targeted
resolution range exists, output the combination set (i, j)
and stop the iteration with success, otherwise make
i = i− 1 and goto step 3. If no satisfying ∆τi, j can be
found even with i = 1, stop the iteration with failure.
The PDR method only needs to record at most 63 results but
can cover as much as 1024 different DU number combination
sets, which greatly reduces the design complexity. It originates
from the following identical equation:
∆τi, j = τs, j − τf, i
= τs, j − τf, 32 + τs, 32 − τf, i + τf, 32 − τs, 32
= ∆τ32, j +∆τi, 32 −∆τ32, 32
(2)
In practical use, the period difference between the two ROs
is obtained by observing an external oscilloscope to collect
the oscillation number k, the initial period difference ∆τini
and the final period difference ∆τfnl for an arbitary DU
number combination set (i, j) (i, j = 32, 31, .., 1), and then
the period difference is calculated as ∆τi, j =
∆τfnl−∆τini
k .
For example, Fig.5 shows a real waveform captured during our
design process by a 2.5 Gs/s Tektronix oscilloscope (series
number: DPO 3032), of which the channel 1 represents the
slow RO while the channel 2 represents the fast RO. It can
be seen that Fig.5(a) shows the entire oscillations waveform
giving k = 25, Fig.5(b) shows the locally enlarged waveform
of the first two oscillations giving ∆τini ≈ 300 ps, and
Fig.5(c) shows the locally enlarged waveform of the last
two oscillations giving ∆τfnl ≈ 1000 ps, so ∆τi,j can be
estimated as 1000 ps−300 ps25 = 28 ps. Fig.5 also shows that
the realized ROs have an approximately 200 MHz frequency
(corresponding to an about 5 ns period).
25k
ps300ini ps1000fnl
Fig. 5. Oscillation waveforms for the DU number combination set (i, j).
According to our design experience, 16 × 16 design space
generating 256 possible DU number combination sets is large
enough to construct our 32-channels TDC. No failure happens
during the whole design process. As an example, we summa-
rize the recorded period difference values with the 16 × 16
design space for the TDC channel No.1 in Table II.
Since a target resolution range of 25 ~ 35 ps is chosen in
our design, by exploiting Table II, we can easily conclude the
satisfying DU number combination sets by applying equation
(2). For example, the combination set (i, j)=(25, 30) gives
∆τ25, 30 = −168+ 62− (−133) = 27 ps which demonstrates
itself a valid DU combination candidate. It should be noticed
that the PDR method just provides an estimation of the
resolution whose accurate value should be obtained from the
code density tests as performed in Section III.
III. TEST RESULTS
This paper built a 32-channels TDC prototype on a single
EP3SE110F1152I3 Stratix III device from Altera using a self-
6TABLE II
RECORDED PERIOD DIFFERENCE VALUES FOR THE TDC CHANNEL NO.1
IN OUR DESIGN
i (fixing j = 32) ∆τi,32 (ps) j (fixing i = 32) ∆τ32,j (ps)
32 -133 32 -133
31 -175 31 -131
30 -63 30 -168
29 -96 29 -256
28 70 28 -230
27 45 27 -371
26 88 26 -344
25 62 25 -383
24 131 24 -333
23 145 23 -400
22 130 22 -286
21 125 21 -433
20 190 20 -400
19 450 19 -406
18 135 18 -362
17 90 17 -400
designed test board. The coarse counter is set as 9 bits width
running at 600 MHz clock rate (corresponding to 1667 ps
period). The fine time counter is set as 7 bits width. There
are totally 16 bits to represent a timestamp. Resource report
after compilation shows the LUT occupation percentage is
319/85200 (0.4%) and the register occupation percentage is
104/85200 (0.2%) per TDC channel. So only about 13% LUTs
and 7% registers of the FPGA chip are cost for the 32-channels
TDC design.
(b)
(a)
Fig. 6. The DNL (a) and INL (b) of the TDC channel No.1.
During tests, all recorded timestamps were transferred to PC
via the USB 2.0 bus for further analysis. Code density tests
(a)
(b)
0 500 1000 1500 2000 2500 3000 3500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
Time interval (ps)
TD
C
 o
ut
pu
t r
es
ul
t (
ps
)
2252003.1  xy
0 4,000 8,000 12,000 16,000 20,000 24,000 28,000 31,000
0
4,000
8,000
12,000
16,000
20,000
24,000
28,000
32,000
35,000
Time interval (ps)
TD
C
 o
ut
pu
t r
es
ul
t (
ps
)
2244005.1  xy
Fig. 7. Transfer curves. (a) using a step size of 100 ps and a dynamic range
of 3.5 ns; (b) using a step size of 500 ps and a dynamic range of 30 ns.
were applied to test the performance of the DNL and INL.
Furthermore, precision RMS test was also performed via two
TDC channels by feeding two hit signals having a fixed delay
value. To reduce any possible statistical error of counting, the
test sample size was set to be one million. All the mentioned
tests were conducted using nominal supply voltages and at an
ambient temperature of around 20°C.
A. Specific performance characterization of TDC channel
No.1
To apply the code density test, an arbitrary function gen-
erator AFG3251 was used to generated pulsed signals with
repetition frequency of 500.1 kHz. The generator ran under an
uncorrelated clock with the TDC to guarantee the correctness
of the code density tests. The pulsed signals were introduced
into the FPGA chip acting as hit signals. The tested fine
timestamps for TDC channel No.1 lie in range of (9 ~ 64),
so the LSB = 1667 ps64−9 = 30.3 ps. The obtained diagrams of
the DNL lying in the range of (-0.15 LSB ~ 0.82 LSB) and
the INL lying in the range of (-0.21 LSB ~ 0.28 LSB) are
depicted in Fig.6.
To test large time interval results and evaluate the precision
RMS, TDC channel No.2 was included. The AFG3251 was
used to generate two correlated hit signals with a programmed
delay value ranging in (0 ps ~ 30000 ps). The two hit signals
were fed to TDC channels No.1 and No.2 respectively by using
two co-axial cable with equal length. The TDC output results
were obtained by subtracting the time results of channel No.1
from those of channel No.2. Before test, a 740Zi Lecroy digital
oscilloscope working at the Random Interleaved Sampling
Mode (RIS) providing 200 Gs/s equivalent sampling rate was
used to determine the signal jitter introduced by the AFG3251
which turned out to be less than 8 ps. That value has small
732,400 32,500 32,600 32,700 32,800 32,900 33,000 33,100
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Time interval (ps)
Co
un
t
Mean: 32737 ps
Precision: 34 ps RMS
(a)
(b)
Fig. 8. Histogram of the time interval results between the TDC channels
No.2 and No.1.
AFG3251
TDC 
channel
No.1
delay 
module 
No.1
input FPGA-EP3SE110F1152I3
.
.
.
.
.
.
TDC 
channel
No.2
delay 
module 
No.2
TDC 
channel
No.32
delay 
module 
No.32
Fig. 9. Test configuration for the proposed 32-channels TDC.
influence to our final results. The transfer curves of the TDC
are depicted in Fig.7 of which Fig.7(a) uses a step size of 500
ps and a dynamic range of 30 ns while Fig.7(b) uses a step
size of 100 ps and a dynamic range of 3.5 ns. The fitted linear
curve has a slope very close to 1 which demonstrates that the
TDC has very good linearity performance. The offset in the
figure is mainly caused by the delay path difference of the
two TDC channels from the IO element to the TDC module
on the FPGA chip. During the transfer curve test process, the
precision RMS values at each time interval point are calculated
simultaneously which turn out to lie in the range of (32 ps ~
40 ps). As an example, the histograms of the time interval
results with values of 2324 ps and 32737 ps are depicted in
Fig.8.
B. Performance summarization of all the 32 TDC channels
In this section, a specific test configuration depicted in Fig.9
was applied to help simplify the test process. The AFG3251 is
used to generate hit signals with repetition frequency of 500.1
kHz. We set a delay module independently for each of the 32
TDC channels which is composed of cascaded NOT gates with
even number (the gates number is randomly set in the range
of 40 ~ 100). This configuration is very useful to evaluate the
precision RMS under large time interval tests such as 4 ~ 20
ns.
By analyzing the distribution of the time timestamps for
each of the 32 TDC channels, all important performance pa-
rameters including fine time counter range, resolution, equiva-
lent bin width, DNL, INL and precision RMS can be obtained.
The detailed parameters results are listed in Table III. In this
table, the resolution is calculated as LSB = 1667 psnm−n0 . The term
equivalent bin width weq can take effects of the various bin
widths into account [19]. It is calculated as weq =
√∑
i(
w3i
W )
with W =
∑
i wi, where wi represents the bin width for the
i-th bin number. All the wi values are obtained by the code
density tests.
From Table III, we conclude that the obtained resolutions
and equivalent bin widths all lie in the range of (23.2 ps ~
37.2 ps), and the fact that they are very close to each other
reflects that the TDC has good linearity performance [10]. The
obtained DNL results generally lie in the range of (-0.4 LSB
~ 0.4 LSB) with a maximal amplitude of 0.59 LSB (channel
number 6) and the obtained INL results generally lie in the
range of (-0.7 LSB ~ 0.7 LSB) with a maximal amplitude of
0.87 LSB (channel number 6). The obtained linearity is not as
good as that reported in [17]. One reason is that the physical
location of a TDC channel on the FPGA chip is found to
influence the linearity error significantly. However, we did not
optimize the physical locations in this design since it would
be considerably time consuming and not necessarily required
in most application cases. All the implementation regions
were automatically generated by the compilation software. If
the designers want to obtain TDC channels with very small
linearity error, manually assigning the implementation regions
and comparing their performance are recommended. Another
reason is that multi-channels may influence each other during
operation and deteriorate the linearity error. Manually and
properly assign the implementation regions may help improve
the linearity performance. Even so the linearity performance
is still relatively better than that in the TDL based method
utilizing carry chains which usually owns a maximal amplitude
of 2 ~ 4 LSBs.
Large time interval results are obtained by subtracting the
time results calculated from TDC channel No.1 from those of
the TDC channels No.2 ~ 32 respectively. The precision RMS
is calculated from the corresponding time interval results for
each of the TDC channels. From Table III, it can be seen that
all of the precision RMS results lie in the range of (32 ps ~
39 ps).
Finally, the dead time of the realized TDC channels is
mainly determined by the oscillation period of the Vernier de-
lay line and the maximal oscillation numbers. The oscillation
8TABLE III
TESTED PERFORMANCE OF EACH OF THE 32-CHANNELS TDC
channel
number
(n0, nm) LSB (ps) equivalent
bin width (ps)
DNL (% LSB) INL (% LSB) mean delay value (ps)
relative to channel No.1
precision RMS (ps)
relative to channel No.1
1 (9, 64) 30.3 30.5 -15~8 -20~28 N/A N/A
2 (13, 60) 35.5 35.7 -7~9 -3~51 5182 34
3 (12, 76) 26.0 26.3 -22~24 -38~40 5689 38
4 (12, 64) 32.1 32.3 -10~11 -26~28 6891 36
5 (8, 64) 29.7 30.0 -15~12 -72~5 7231 34
6 (9, 60) 32.7 33.1 -32~27 -51~36 7983 35
7 (9, 58) 34.0 34.1 -8~8 -2~47 6420 33
8 (8, 59) 32.7 32.9 -11~13 -28~30 8821 32
9 (8, 68) 27.8 30.0 -9~9 -55~16 8136 38
10 (6, 68) 26.9 27.2 -22~16 -28~33 9247 36
11 (6, 57) 32.7 32.9 -10~10 -48~11 9835 32
12 (10, 59) 34.0 34.2 -13~15 -58~7 10125 34
13 (12, 65) 31.5 31.6 -9~9 -51~22 11678 36
14 (15, 73) 28.7 29.0 -19~22 -52~12 12872 39
15 (10, 60) 33.3 33.6 -18~22 -46~20 14883 33
16 (4, 58) 30.9 31.2 -33~16 -7~65 15824 33
17 (5, 56) 32.7 32.9 -13~11 -62~12 13764 36
18 (4, 59) 30.3 30.5 -18~12 -10~69 12748 37
19 (9, 58) 34.0 34.3 -21~14 -30~51 14782 33
20 (11, 60) 34.0 34.3 -15~16 -68~14 15824 34
21 (8, 76) 24.5 24.7 -20~23 -59~25 16732 35
22 (7, 52) 37.0 37.2 -21~36 -12~41 15320 35
23 (8, 80) 23.2 23.3 -13~18 -21~38 16329 37
24 (6, 70) 26.0 26.2 -13~21 -17~33 15897 38
25 (8, 57) 34.0 34.2 -21~16 -51~10 17231 35
26 (10, 58) 34.7 34.9 -32~12 -44~24 17467 33
27 (8, 58) 33.3 33.4 -21~30 -22~18 18237 32
28 (10, 66) 29.8 29.9 -19~12 -40~38 15983 36
29 (9, 61) 32.1 32.3 -14~15 -54~31 18476 37
30 (5, 58) 31.5 31.7 -20~17 -18~36 19238 38
31 (11, 75) 26.0 26.2 -13~21 -26~33 19782 36
32 (7, 55) 34.7 34.8 -13~17 -12~17 18472 32
N/A=not applicable
period (from Fig.5) is about 5 ns and the maximal oscillation
number is 80 (from Table III, channel No.23) leading to the
dead time of 5× 80 = 400 ns.
IV. DISCUSSION
Carry chains are usually organized in TDL style which is
the mainstream realization method for FPGA-based TDCs.
This method provides low implementation complexity since
the carry chain based TDL can be automatically synthesized by
software compiler without any manual intervention. However,
a plain TDC constructed by the TDL method usually suffers
from large DNL and INL. Fortunately, by applying some well
developed optimization techniques, such as the wave union
[6] or multi-chains averaging technique [10] to improve the
equivalent resolution and the bin-by-bin calibration technique
[2] to improve the INL, this TDC method is very promising
for practical use.
This paper emphasizes the Vernier method by organizing
the carry chains in RO style. This method has demonstrated
itself very competitive in terms of resource cost, DNL and
INL when compared with the TDL method for a plain TDC
design. The shortcomings are that the realized resolution is
not as high as that in the TDL method so far, the dead
time is relatively longer, and manual intervention to adjust
the RO period difference is needed during design process.
However, similar optimization techniques such as the multi-
chains averaging and bin-by-bin calibration can also be applied
to this kind of TDC to further improve its performance. Most
importantly, applying the multi-chains averaging technique is
very valuable to suppress the large precision RMS and further
exploits the resolution capability of such TDCs down to 10
ps level. Some performance comparisons between this work
and some other recent FPGA-based works are summarized in
Table IV.
V. CONCLUSIONS
Our recently proposed RO-based TDCs by organizing the
carry chains in the Vernier loop style are a promising option
for the TDC designers mainly due to its remarkably low
linearity error and low resource cost. However, implementation
complexity problem is posed since this design calls for manual
intervention to the initial fitting results when moving to
multi-channels TDC designs. To combat that problem, this
paper elaborates the key points to construct multi-channels
TDCs to achieve high performance while keeping the least
design complexity: one for the clock extraction module and
one for the fine time interpolator module. Furthermore the
PDR method is proposed to search the potential DU number
combination sets for a targeted resolution which costs at most
31 trials in our example design. The PDR method greatly
9TABLE IV
PERFORMANCE COMPARISONS BETWEEN THIS WORK AND SOME OTHER RECENT FPGA-BASED WORKS
ref. chip method resolution
(ps)
precision
RMS (ps)
DNL
(LSB)
INL
(LSB)
dead time
(ns)
costed
registers
costed
LUTs
[4] Virtex-5 TDL 30 15 -1 ~ 3 -4 ~ 4 30 571 1064
[5] UltraScale TDL 4.5 3.9 N/S N/S 4 N/S N/S
[9] Virtex-6 TDL 10 12.8 -1 ~ 1.91 -2.2 ~ 3.93 N/S N/S N/S
[11] Cyclone II TDL 21.8 28.8 N/S N/S N/S 23494 28085
[12] Virtex-6 TDL 10 19.6 -1 ~ 1.5 -2.25 ~ 1.61 3.3 N/S N/S
[13] Spartan-3 pulse shrinking 42 56 -0.98 ~ 0.6 -4.17 ~ 3.6 710 N/S N/S
this work Stratix III Vernier 23 ~ 37 32 ~ 39 -0.4 ~ 0.4 -0.7 ~ 0.7 400 319 104
N/S=not specified
reduces the implementation complexity during the fine time in-
terpolator construction process. This paper built a 32-channels
TDC on a Altera Stratix III FPGA and demonstrates good
performance. This paper greatly eases the designing difficulty
of the carry chain RO-based TDCs and can significantly propel
their development in practical use.
REFERENCES
[1] J. K. Dan, M. E. Casey, M. Conti, B. W. Jakoby, C. Lois, and D. W.
Townsend, “Impact of time-of-flight on PET tumor detection,” Journal
of Nuclear Medicine, vol. 50, no. 8, pp. 1315–1323, 2009.
[2] J. Wu, “Several key issues on implementing delay line based TDCs
using FPGAs,” IEEE Transactions on Nuclear Science, vol. 57, no. 3,
pp. 1543–1548, June 2010.
[3] J. Song, Q. An, and S. Liu, “A high-resolution time-to-digital converter
implemented in field-programmable-gate-arrays,” IEEE Transactions on
Nuclear Science, vol. 53, no. 1, pp. 236–241, Feb 2006.
[4] L. Zhao, X. Hu, S. Liu, J. Wang, Q. Shen, H. Fan, and Q. An, “The
design of a 16-channel 15 ps TDC implemented in a 65 nm FPGA,”
IEEE Transactions on Nuclear Science, vol. 60, no. 5, pp. 3532–3536,
Oct 2013.
[5] Y. Wang and C. Liu, “A 3.9 ps time-interval RMS precision time-to-
digital converter using a dual-sampling method in an UltraScale FPGA,”
IEEE Transactions on Nuclear Science, vol. 63, no. 5, pp. 2617–2621,
Oct 2016.
[6] J. Wu and Z. Shi, “The 10-ps wave union TDC: improving FPGA
TDC resolution beyond its cell delay,” in 2008 IEEE Nuclear Science
Symposium Conference Record, Oct 2008, pp. 3440–3446.
[7] J. Wu, Z. Shi, and I. Y. Wang, “Firmware-only implementation of time-
to-digital converter (TDC) in field-programmable gate array (FPGA),” in
2003 IEEE Nuclear Science Symposium. Conference Record, Oct 2003,
pp. 177–181.
[8] J. Wang, S. Liu, L. Zhao, X. Hu, and Q. An, “The 10-ps multitime
measurements averaging TDC implemented in an FPGA,” IEEE Trans-
actions on Nuclear Science, vol. 58, no. 4, pp. 2011–2018, Aug 2011.
[9] J. Y. Won, S. I. Kwon, H. S. Yoon, G. B. Ko, J. W. Son, and J. S.
Lee, “Dual-phase tapped-delay-line time-to-digital converter with on-
the-fly calibration implemented in 40 nm FPGA,” IEEE Transactions
on Biomedical Circuits and Systems, vol. 10, no. 1, pp. 231–242, Feb
2016.
[10] Q. Shen, S. Liu, B. Qi, Q. An, S. Liao, P. Shang, C. Peng, and W. Liu,
“A 1.7 ps equivalent bin size and 4.2 ps RMS FPGA TDC based
on multichain measurements averaging method,” IEEE Transactions on
Nuclear Science, vol. 62, no. 3, pp. 947–954, June 2015.
[11] W. Pan, G. Gong, and J. Li, “A 20-ps time-to-digital converter (TDC)
implemented in field-programmable gate array (FPGA) with automatic
temperature correction,” IEEE Transactions on Nuclear Science, vol. 61,
no. 3, pp. 1468–1473, June 2014.
[12] M. Fishburn, L. H. Menninga, C. Favi, and E. Charbon, “A 19.6 ps,
FPGA-based TDC with multiple channels for open source applications,”
IEEE Transactions on Nuclear Science, vol. 60, no. 3, pp. 2203–2208,
June 2013.
[13] R. Szplet and K. Klepacki, “An FPGA-integrated time-to-digital con-
verter based on two-stage pulse shrinking,” IEEE Transactions on
Instrumentation and Measurement, vol. 59, no. 6, pp. 1663–1670, June
2010.
[14] J. Kalisz, R. Szplet, J. Pasierbinski, and A. Poniecki, “Field-
programmable-gate-array-based time-to-digital converter with 200-ps
resolution,” IEEE Transactions on Instrumentation and Measurement,
vol. 46, no. 1, pp. 51–55, Feb 1997.
[15] B. Markovic, S. Tisa, F. A. Villa, A. Tosi, and F. Zappa, “A high-
linearity, 17 ps precision time-to-digital converter based on a single-stage
Vernier delay loop fine interpolation,” IEEE Transactions on Circuits
and Systems I: Regular Papers, vol. 60, no. 3, pp. 557–569, March
2013.
[16] J. Yu, F. F. Dai, and R. C. Jaeger, “A 12-bit Vernier ring time-to-digital
converter in 0.13 um CMOS technology,” IEEE Journal of Solid-State
Circuits, vol. 45, no. 4, pp. 830–842, April 2010.
[17] K. Cui, Z. Ren, X. Li, Z. Liu, and R. Zhu, “A high-linearity, ring-
oscillator-based, Vernier time-to-digital converter utilizing carry chains
in FPGAs,” IEEE Transactions on Nuclear Science, vol. 64, no. 1, pp.
697–704, Jan 2017.
[18] A. A. Abidi, “Phase noise and jitter in CMOS ring oscillators,” IEEE
Journal of Solid-State Circuits, vol. 41, no. 8, pp. 1803–1816, Aug 2006.
[19] J. Wu, “Uneven bin width digitization and a timing calibration method
using cascaded PLL,” in 2014 19th IEEE-NPSS Real Time Conference,
May 2014, pp. 1–4.
