Abstract-This work presents a multi-channel time-to-digital converter (TDC) based on a field-programmable gate array (FPGA). This TDC shares many advantages of custom circuitry but few of the drawbacks. A thorough characterization of the TDC, based on a Xilinx Virtex-6 FPGA, is presented and several performance parameters are described, including distortions due to the FPGA architecture, temperature effects, intra-chip position variation, and chip-to-chip variation. An optimized TDC exhibits lOps resolution, 3.86LSB integral non-linearity, and a throughput of 300MS/s. Also, measurements are shown for TDC-to-TDC distortion for multi-channel TDCs and simulations are performed to investigate parallelism using multiple TDCs. Results imply that FPGA-based TDCs can achieve high performance, and can be used in a wide range of applications requiring high throughput and accurate time measurements.
I. INTRODUCTION
T IME interval measurements are required in many applica tions. In the field of positron emission tomography (PET), time interval measurements are indirectly used to narrow down the location of positron emission, thus improving SNR.
Especially in time-of-flight PET, the accuracy of the time measurement is critical for data reconstruction. Throughput and the number of channels are also important, as they effect the measurement speed and the system-level complexity.
For PET, it is desirable to have physical constraints, such as scintillator coupling or sensor response, be the limiting factor, rather than the timing measurement. High demands are therefore placed on the time measurement devices, and sub lOOps systems with high accuracy and throughput are desired.
For PET systems, effective CMOS implementations have been shown [1], [2] . The downside of CMOS implementations is the custom development process. It is time consuming, and difficult to adapt easily to a slightly different system. Recent developments show that there is a growing interest in time measurement circuits implemented in FPGAs [3] [5]. FPGAs are integrated circuits, which consist of blocks with predefined logic. This logic is organized in such a way that all kinds of digital logic can be easily created using a hardware description language. The description language makes it possible to implement, efficiently, the same hardware 
II. ARCHITECTURE
There are different ways of building a TOC on a FPGA.
The major architectures are based on simple delay lines and Vernier delay lines [3] , [4] , [7] - [10] . Also, other structures have been invented to take advantage of the properties of the FPGA logic [5] , [11] , [12] . From this work it is clear that there is a trade-off between resolution and accuracy, and great care needs to be taken in the design phase. For choosing the best TOC architecture, it is necessary to look closer at the FPGA architecture . Not only does the size of the layout needs to be taken into account, but also the availability of the required resources on the chip and the possible distribution of the start and-stop-signal across the chip.
An FPGA has a pre-defined structure based on look-up tables (LUTs), additional selection, and carry logic. The LUTs are used to define logic functions, and the selection and 978-1-4673-0120-6/11/$26.00 ©2011 IEEE adders. The carry logic of a slice can be used to create a TDC, and it is the basis for the architecture in the present work. This carry logic is widely available on-chip, and the implemented delay line structure requires less calibration compared to Vernier-delay-line based methods. The TDC architecture is based on the Nutt architecture used in [10] , which consists of a coarse counter for the most significant bits and a fine counter implemented using a delay line for the least significant bits of the time interval. The fine counter is supported by an encoder for mapping the binary value to a time representative, while the coarse counter can be read directly without such an encoder.
A schematic view of this two-stage architecture is shown in manager is used to achieve a low jitter 600MHz clock [13] .
The ROMs attached to the encoder are used to compensate the fine delay line result for non-linearity. One ROM is for static non-linearity while another ROM is for dynamic non-linearity, needing to be updated every calibration period.
III. CH ARACTERIZATION

A. Experimental setup
The performance of the TDC was characterized using a density test [14] . For a density test, a random time interval generator is needed. This is realized using a Single-Photon Avalanche Diode (SPAD), placed in the dark with a count rate below 10kHz, giving a sufficiently uniform, random time distribution [15] . The SPAD is attached to the FPGA, using a SM A-cable, with the TDC implemented on chip. The FPGA is placed inside a temperature chamber to control the environ ment temperature. Temperature and voltage data are sent to a computer workstation as well as all encoded TDC values.
The encoded values are analyzed offline to calculate the non linearity. The FPGA-based TDC has the delay line shielded by one ring of blank slices, which have no implemented logic, and so external influence is excluded in this way.
B. Scaling
The first TDC based on the proposed architecture used a Virtex-5 (65nm technology) [10] . Porting the architecture from the Virtex-5 to the Virtex-6 (40nm technology) gave a resolution improvement from 17ps to lOps. The carry chain structure, used for the buffer delay line, has an identical interface in both implementations. With the scaling to a new platform, faster clocks became available, improving the throughput and the possibility to decrease the delay line length.
Decreasing the delay line length implies less jitter and less non-linearity induced by the clock distribution inhomogeneity.
Longer delay lines implies more jitter accumulation, because more elements will contribute to jitter and larger delay lines implies more clock region crossings, increasing the non linearity.
C. Non-linearity 
D. Sources of static non-linearity
Some non-linearity can be calibrated beforehand. At system start-up the non-linearity must be measured and stored. The static non-linearity is caused by the chip structure, clock distribution, and the transistor properties locally. The chip structure is given, however we will show that exploring this structure and other factors presents room for improvements.
1) Logic structure:
The delay elements consist of carry logic which is situated inside an FPGA slice. In the Virtex- There is also more variability in the first half of the positions, caused by the fact that the clock regions are not symmetric, which can also be observed in Figure 3 . temperature is shown in Figure 9 . The changes in temperature will affect the propagation speed of the delay line, changing the number of delay elements, which fit in one clock cycle, and therefore the resolution is affected by the temperature;
higher temperatures will imply a worse resolution. At 10°C the resolution will be below lOps and at 60°C the resolution will be a little above lOps, as shown in the table of Figure 9 .
The temperature will affect all the transistors, and therefore a certain offset will be added to the DNL of each bin. Other phenomena besides the offset can be observed in the figure as well, because there is a switch in the order of the plotted lines around bin 120. This is caused by the clock distribution, so these two effects need to be summed and both contribute to the change in non-linearity. No deterrninstic source of the temperature depedence has been found. A test with real-time compensation using the temperature sensor on-chip and using the measured non-linearity on beforehand was performed, but showed to be unsuccessful. Real-time calibration is needed in order to compensate for this variability. The calibration can be done by measuring the size of the delay line during a calibration period and use this result to correct the mapping of the thermometer code to a correct binary representative. This calibration period will reduce the throughput depending on its calibration frequency, but will enhance TDC accuracy. Figure 10 .
During the test the temperature was kept constant using a temperature chamber. Three different test runs were made per 
Cl -4 �------�------�--------�--
IV. RESULTS
The final TDC implementation in the Virtex-6 is based on the characterization parameters found and discussed in the former sections. The optimal TDC design can be derived and implemented taking into account the static non-linearity and the dynamic non-linearity. Table I .
Voltage variation is negligible, but temperature variations need real-time calibration by introducing a calibration period.
TDCs placed side-by-side will give additional non-linearity, which cannot be predicted, and requiring guard slices to shield the delay line from external influences. One guard slice around The third parameter to consider in this trade-off is through put. When TDCs are purely used for throughput the throughput can be increased by a factor of the number of used TDCs.
No calibration period a into account yet, which will also influence the effective throughput. Another parameter is area; the number of TDCs to be implemented on the chip is depending on the area. Larger chips will also result in more channels.
So a trade-off needs to be made between resolution, accu racy, throughput and area. Parallel placed TDCs can be very effective to increase the performance even beyond the chip and architecture limitations, but a careful trade-off needs to be made. 
