Abstract-Coincidence time resolution is one of the most important issues in PET detectors. Improving this resolution is required to increase the noise equivalent count rate (NECR) that reduces the noise in the reconstructed images. The aim of this work is to evaluate the behavior and time resolution of different proposed time pick-off algorithms in order to select the best configuration for our PET system. The experimental setup used for this research is composed by two monolithic LSO crystals+PSPMT detectors and an FPGA based PET data acquisition system ( The results show the importance of selecting the right algorithm and parameters. Time coincidence resolution in our hardware system can be improved by up to 6.9 ns FWHM depending on the chosen digital algorithm programmed on the FPGA. The measurements with our setup reveal that charge based algorithms are less sensitive to signal noise and generate better results than amplitude algorithms. The best configuration achieves a FWHM resolution close to 1.8 ns.
I. INTRODUCTION

P
OSITRON EMISSION TOMOGRAPHY (PET) is a medical diagnosis technique which is basically used in oncology, neurology, and cardiology. Radiotracers, which are substances marked with radioactive isotopes, are required to obtain functional information about the physiological processes that occur inside the animal or human body. PET images the distribution of these radiotracers. The technique is based on the detection of two time-coincident 511 keV high energy gamma rays resulting from the annihilation of a positron with an electron. PET detectors need to obtain the energy, the position and the arrival time of each detected gamma ray. Coincidence time resolution is one of the most important issues in PET detectors. Improving this resolution is required to increase NECR and random event rejection, so the noise in the reconstructed images is reduced [1] .
Traditionally, time pick-off techniques have been implemented analogically. Analog techniques are developed for a specific detector type so they can be optimized to achieve very accurate time resolutions. However, very complex electronics are needed and they cannot be modified once they have been developed [2] . The system dead time is worse than in digital systems.
The most popular analog time pick-off methods are Leading Edge (LE), Constant Fraction Discriminator (CFD) and ARC (Amplitude and Rise time Compensated).
In the LE algorithm, the timestamp assigned to a pulse is related with the instant when the pulse crosses a constant threshold. The main disadvantage of LE is that the generated timestamp depends on the height and rise time of the incoming pulses [3] , [4] .
In the CFD algorithm, the input signal is delayed and a fraction of the undelayed input is subtracted from the delayed one, generating a bipolar signal which has a zero crossing point. This point is selected as the timestamp. The delayed time should be enough to take the maximum value of the input signal [3] , [4] . CFD resolves the LE error problem due to amplitude variations, but it does not resolve the error due to rise time variations. Different versions of the method have been proposed [5] , [6] .
ARC was proposed to minimize the timestamp error originated by rise time variations in the incoming pulses. As in CFD, a bipolar signal is generated but, in this case, the delayed time is taken before the input signal reaches the maximum. The best results for this algorithm are achieved for small delay times. This algorithm minimizes the errors caused by height and rise time variations.
Time resolutions obtained with these analog methods are in the range between 300 and 500 ps [7] , [8] .
Digital time pick-off methods are being increasingly considered to replace analog processing electronics in PET systems [7] . For this reason, investigation focuses on PET systems based on an early digitalization of the detected signals followed by an intense digital processing stage. Different digital time pick-off algorithms have been proposed to extract an accurate timestamp from the detected events. Many of them are based on the traditional analog methods, but new techniques have also been proposed. Currently, digital time pick-off algorithms cannot achieve the analog techniques time resolution. But several approaches have been proposed and analyzed in recent years to improve the time resolution of digital techniques.
Many studies proposed digital time pick-off algorithms and evaluated them offline. Visser et al. [9] analyzed a digital version of the LE method. Time difference was measured using an oscilloscope in a system based on LSO/APD and LSO/PMT detectors. Nakhostin et al. [10] presented a digital ARC (DARC) in a CdTe PET system. In order to test the algorithm, the waveforms from the detectors were collected using an oscilloscope and the resulting data were analyzed offline using Matlab.
In other approaches, digital versions of the CFD (DCFD) were applied. Fallu-Labruyere et al. [7] used this method in a system based on a commercial digital pulse processing module coupled to pairs of PMTs with either LSO or LaBr scintillator crystals. Joly et al. [11] applied also a DFCD in a set-up with two detection blocks based on photodetectors (APD or PMT) with inorganic scintillators. The signals were sampled at high frequencies and analyzed offline. In this case, the DCFD was compared with two optimal filtering techniques based on parameter estimation with minimal variance. Leroux et al. [12] also compared the DFCD with other digital time pick-off methods: time-shift LMS error model, linear fit and BGO model fit. The acquired signals came from a system with BGO/APD detectors.
Finally, Bousselham et al. [13] , [14] proposed an approach based on the analysis of the noise in LSO/PMT and LSO/APD detectors. They showed that the noise in these systems was nonstationary and derived a timing algorithm for this kind of noise using the least square method.
Other studies have implemented in FPGAs versions of the previously described methods and have also proposed new ones. Guerra et al. [15] proposed different algorithms, including linear interpolation, DCFD, a classical matched filter (which is usually considered to be optimum in the presence of additive noise) and an optical matched filter (which is meant for Poisson noise). In their work, they provided details of the actual hardware implementation of the timing algorithm, as part of small animal PET system [16] .
Streun et al. [17] implemented a method based on the calculus of the baseline crossing of the line through the two samples before the pulse maximum, obtaining a time resolution of 2 ns FWHM in a LSO/PMT system. In other studies [18] , [19] , a method based on a simple filter with a callibration procedure was implemented and tested for the Clear-PEM Scanner. A time coincidence resolution of 5-5.3 ns FWHM was achieved.
Haselman et al. [20] proposed a method based on the least square approximation of the shape of the pulse with two exponentials (for the rising and falling slopes). Once the fitting function had been calculated, the beginning of the pulse was obtained and used as timestamp. The method is similar to LE, but data are normalized to the reference pulse, so effects of amplitude variation are eliminated. The method was implemented [21] in an FPGA with different kinds of detectors, obtaining time resolutions higher than 4.5 ns. Martinez et al. [22] evaluated a version of a DCFD in a test set-up consisting of two opposite LSO detectors. A resolution of 5.17 ns FWHM was obtained. Fontaine et al. [2] proposed a linear interpolator, equivalent to a DCFD, to calculate the timestamp. Their scanner was based on APDs, and the digital processing was shared between an FPGA and a DSP. Timing resolutions ranging between 5.4 ns and 9.8 ns were obtained for different types of detectors. They suggested that better results may be obtained with more advanced digital timing algorithms based on artificial neural networks [12] or low-pass filtering and linear interpolation [23] .
In previous works [24] , [25] , we also proposed several digital time pick-off methods to improve time resolution. In the present work, the main goal is to observe the behavior and time resolution of the proposed algorithms in our PET system, implementing the methods in the FPGA of the data acquisition system (DAQ). Two different groups of time pick-off methods have been studied: amplitude-based algorithms, which work as a DCFD, and charge-based algorithms, which work as a DARC. The charge-based algorithms were first proposed in one of our previous studies [24] .
Another objective of the current study is to analyze the influence on time resolution of upsampling the input signals by means of digital processing techniques. For this task, polyphase low-pass filter interpolators have been implemented in the FPGA. These structures allow us to implement the filter interpolator structure at low clock frequency.
All the algorithms have been studied for different configurations to better understand its behavior.
II. INSTRUMENTS AND METHODS
A. Experimental Setup
The experimental setup used for this research is composed by two LSO detectors separated 700 mm and an FPGA based PET DAQ [26] , [27] . Each head has one 42 42 10 mm monolithic LSO crystal and a position sensitive photomultiplier (PSPMT), Hamamatsu H8500 (Hamamatsu Photonics K.K., Shizuoka, Japan). PSPMT signals from one head are translated into four signals for position and energy, and one for depth of interaction calculus, using a developed ASIC that integrates a charge division network [26] , [28] . The PSPMT detector's last dynode signal is used for time extraction and trigger. Due to its fast rise time and low amplitude, this signal should be shaped and amplified before being sampled. A CR-RC shaper has been used. For the high pass stage, a passive first order filter with a pole-zero cancellation is implemented. For the low pass section, a second order Sallen Key cell is implemented. At least one sample in the rise time of the incoming event is needed for the proposed digital algorithms. The shaped last dynode signal has a rise time of about 45 ns. As the system has the possibility of working at 35 MHz or 70 MHz, this rise time guarantees the required condition in both cases. In the present study, the chosen clock frequency was 70 MHz.
The selected shaping determines the width of the incoming pulses. This width and the maximum pileup rate considered for the system determine the maximum event rate for the proposed time pick-off methods. In our case, at 70 MHz, the selected shaping generates a pulse width of approximately 150 ns, so this is the minimum separation between two consecutive events that our system can differentiate. That limits the maximum detectable event rate.
The DAQ system [27] is composed by six PCB: four DAQ cards, one coincidence detection card and a back plane card. Each of the DAQ cards acquires the signals from the detector heads independently. The acquired signals are sampled using a 8 channels 12-bit 70 MHz Texas Instruments ADC ADS5273. Digitalized signals are processed in each board using a Xilinx Virtex 5 FPGA XC5VLX85T (Xilinx, San Jose, CA, USA). The FPGA extracts position, depth of interaction (DOI), energy and timestamp from the acquired signals. The time pick-off block in Fig. 1 is where the proposed algorithms are implemented.
All the information from these four boards is sent to a new PCB by means of a custom-made backplane. This board has another Virtex 5 FPGA XC5VLX85T, where the event coincidence detection is implemented. The true coincidences are sent to a standard personal computer (PC) through a 2.0 USB link. For this experiment, a coarse coincidence window is implemented and set to 28.57 ns. This allows us to reduce the number of events received in the PC and to have temporal information from the detector's acquired pulses.
B. FPGA Programed Digital Time Pick-Off Methods
Four digital algorithms to extract time information from the acquired pulses have been evaluated: (a) Amplitude bipolar digital constant fraction discriminator (BCFD), (b) charge BCFD, (c) interpolated amplitude BCFD and (d) interpolated charge BCFD. As can be seen in Fig. 2 , each of them is a combination of different digital signal processing methods that are described below.
1) Charge Estimation:
The charge of the pulse (ch[n]) can be estimated in a digital system as:
(1) where x[n] is the input acquired signal, is the value when x[n] exceeds the init threshold (the trigger condition), and is the value when x[n] falls below the end threshold (the end condition). One advantage of using calculated charge instead of directly the sampled signal is that it compensates errors due to signal noise.
2) Low Pass Filter Interpolation: Digital low pass filter interpolation is a way to upsample the acquired signals. This method is based on the Nyquist sampling theorem, which states that it is possible to recover a continuous signal from the obtained sampled signal, as long as the sampling frequency is greater than twice the signal bandwidth [29] . Two different architectures for the interpolation algorithm have been used, a classical architecture for one-sample interpolation and polyphase filters architecture for two-sample interpolation. These methods allow us to work with two different FPGA internal sampling frequencies: 140 MHz and 210 MHz.
The classical method interpolates the signals adding zeros between the original samples x[n], where I is the interpolation factor.
(2)
Then, the signals are low pass filtered using a finite impulse response (FIR) filter with a cutoff frequency equal to fs/(2I), where fs is the original sampling frequency (70 MHz in our case):
where y[n] is the interpolated signal and are the digital filter coefficients. This classical method was implemented in the interpolation by factor 2 (one-sample interpolation). A 21 coefficient digital low pass FIR filter was used. To calculate those coefficients, a Hamming window method was selected because of its good attenuation on the stop band and a low ripple at the band pass. Sixteen bits per coefficient were used. The cutoff frequency was 70/4 MHz.
The implementation in a FPGA should be optimized to work with the new high frequencies needed in the interpolation methods. One of the most usual problems working with FPGAs is the logic speed for an implemented design. For this reason, polyphase structures have been implemented [29] . This architecture allows us to filter the signal in the low frequency domain, before upsampling it. The number of filters is increased, but the number of total coefficients is not. To interpolate the incoming signal by a factor 3 (210 MHz, two-sample interpolation) a 33 coefficient low pass filter using a Hamming window is calculated. The cutoff frequency is 70/6 MHz. The number of coefficients for the filter has been increased to reduce the transition band compensating the reduction of the cutoff frequency. As can be seen in Fig. 3 , three FIR polyphase filters should be generated from the original FIR filter. The coefficients for the polyphase filters are obtained as: (4) where I is the interpolation factor, are the coefficients of the filter are the coefficients of the original filter and R is the number of coefficients of each polyphase filter. R should be a integer number, which is calculated as , where M is the number of coefficients of the original filter. The designed interpolation has . At each low frequency clock cycle, I outputs from each polyphase filter are generated. The interpolated signal y[n] is obtained as:
. . . where x[n] is the signal to be bipolar transformed, k represents the shift that is applied to generate the bipolar signal and A is a constant value (2 in charge methods and 1 in the amplitude ones). A should be greater than 1 in the charge signals to generate a bipolar signal with a zero crossing point. The value has been selected because it is easy to implement in a digital system.
The two samples nearest to zero define a line that crosses zero in a time that is considered the timestamp for the pulse. In Fig. 4 , it is shown how a timestamp is obtained from a charge generated pulse. S1 and S2 are the samples before and after the zero crossing point. The timestamp is calculated as: (7) (8) where is the timestamp in samples, is the sample number for is the timestamp in seconds and is the equivalent sampling frequency (in our case, 70 MHz, 140 MHz or 210 MHz depending of the implemented method).
The algorithm has been tested for different values of k to evaluate its effect on time resolution.
C. Hardware Implementation
Time pick-off methods have been implemented inside the DAQ Xilinx Virtex 5 FPGA XC5VLX85T using ISE 10.1 for synthesis and implementation. In this version, there was no core for implementing the FIR filters in a Virtex5 device using distributed arithmetic. All the filters used in this design where fully programed in verilog using for each coefficient tap distributed arithmetic multipliers generated by ISE coregen tool. The coefficients for each filter were calculated using the Matlab 7.5 filter design tool.
A transposed form to implement the FIR filters bas been used. This structure improves the design speed because it reduces the Fig. 4 . The BCFD block for a charge signal generates a bipolar signal that has a zero crossing value. Using the S1 and S2 values before and after the zero crossing the timestamp is calculated. Fig. 5 . Measured Na energy spectrum from the detector nearest to the source. The data has been filtered in position in both detectors and with a coarse coincidence window (28.57 ns). The points corresponding to the 10% of the histogram are marked in black. logic path between registers, which is composed by just one multiplier and an adder [29] .
The implemented structures used for the algorithms fulfill the specifications in time and area of our FPGA. The area from the designed filters can be improved using the coefficient symmetries from the FIR filters and removing the multipliers with zero coefficients. The new ISE 12 version has several improved cores that can be used. Polyphase filter interpolators using distributed arithmetic are available for Virtex 5 and Virtex 6 devices.
The divider core allows us to implement a division that generates a result each clock cycle with a fixed clock cycle latency. The result of the divider is given in fractional form, so the fine time stamp is directly available. The division core is implemented at low frequency in the event builder. This is possible because the event rate in each detector is low compared with the 70 MHz lowest frequency from the FPGA. In the present work, the divider has been bypassed to obtain a non-quantified time resolution.
The digital signal processing logic at high frequency (210 MHz) has been extremely pipelined. In the used device, it is necessary to reduce the logic path to achieve the required speed. It would be difficult to increase the interpolator factor following this architecture. For interpolators which work for frequencies greater than 210 MHz, parallelization of the algorithms would be necessary. Parallelization is not a trivial task, but the low area occupancy for the proposed algorithms allows us to do it in future works.
In Table I , area and resources utilization of each algorithm are shown. The results include all the signal processing modules, the divider, the control logic and the needed logic for the cross clock domains. As reference, the whole system resources occupancy for the most complex configuration (charge BCFD interpolated by 3) is added. As can be seen, the algorithms are not area restrictive.
D. Coincidence Event Selection
Each algorithm has been evaluated with different configurations in this setup using a Eckert and Zigler 10 Ci 1 mm Na spot source. The source was always closer to the center of one detector so, as will be seen in the results, the time coincidence histograms are not zero centered. A coarse coincidence window (28.57 ns) is selected to reduce the number of non-coincidence data received at the PC.
The detected pulses are filtered in position and energy. The energy histogram of the position filtered data, in each detector, is fitted by a Gaussian curve. Only the pulses of central area of the histogram have been considered (those between the energies corresponding to the 10% of the maximum of the histogram). One of the effects of filtering by position is a reduction in the number of Compton events. In Fig. 5 , an energy spectrum from the closer detector to the source after being filtered by position can be seen. Between 45 000 and 140 000 events have been analyzed in each configuration. Table II .
In Fig. 6 , amplitude and charge BCFD are compared for different configurations. A synthesized frequency is represented in each graph (non-interpolated, interpolated by factor 2, and interpolated by factor 3). In the non-interpolated case, charge BCFD gives us better results for delays smaller than , although for greater delay values, amplitude BCFD resolution is better, achieving a minimum for . When an interpolation with factors 2 or 3 is applied, the charge BCFD has better resolution for all the evaluated delays.
The evolution of FWHM time resolution as a function of the delay can also be observed in Fig. 6 . The charge BCFD algorithms achieve their best results for low delays. For high delays, the results converge to a constant FWHM value. Regarding amplitude BCFD algorithms, different behaviors have been observed. In this case, the worst results are observed with low delays. There is an optimal delay value which has the minimum FWHM time resolution. For delays greater than this value, the FWHM time resolution increases. Best time resolution using charge BCFD is 1.82 ns FWHM, as can be seen in Fig. 7 . In this case, the charge BCFD was configured with and interpolated by 3. For amplitude BCFD, the best result that has been achieved in this work is 4.58 ns FWHM, as can be observed in Fig. 8 . In this case, the Amplitude BCFD was configured with and no interpolation was applied.
The method with worst results had 8.7 ns FWHM. This result was achieved for amplitude BCFD, , interpolated by 3. Difference between the best and the worst results is 6.9 ns.
The coefficient k can be seen as a time shift, just multiplying k by the sampling period. In this way, all the charge or amplitude methods can be represented in the same graph as a function of this delay (Fig. 9) . This shows us the influence of interpolation on time resolution. It can be observed that there are points where the interpolated and non-interpolated methods are equivalent. For example, is a configuration equivalent to and , because the time delay is equal in all cases ( ns).
IV. DISCUSSION AND CONCLUSIONS
A good time resolution, using real time digital signal processing, has been achieved in this work, in a real setup with two LSO PSPMT detectors and an FPGA-based digital acquisition system with no dedicated analog-trigger electronics. Results show that time resolution in pure digital PET systems can be improved choosing the right digital signal processing method. The obtained resolutions with the different methods are comparable with current state-of-the-art approaches [2] , [17] - [19] , [21] , [22] . It is interesting to remark that slightly different configurations of the same algorithm can produce important changes in time resolution (a maximum difference of 6.9 ns between methods has been obtained in the current study, which implies a 79% reduction from the worst value).
It has been observed that, as a rule, amplitude BCFD methods generate worse results than equivalent charge BCFD methods. Possibly, charge methods achieve better results because: (1) Charge algorithms are less sensitive to ambient noise. (2) Charge BCFD is a DARC method, based on analog ARC time pick-off algorithms, which are less sensitive to variations in amplitude and rise time. ( 3) The bipolar signal that is generated in the charge BCFD methods has a more linear behavior near the zero crossing point than the equivalent amplitude BFCD methods. Calculating the charge inside the FPGA and using it to obtain the time from the arriving pulse improves time coincidence resolution with a low use of FPGA resources.
Charge BCFD best results are obtained near the beginning of the pulse, as happens with analog ARC methods. This seems to indicate that better resolutions would be obtained with charge BCFD methods with higher interpolation rates and lower delays. However, there would be a practical limit due to the noise of the signals and the ADC quantification.
On the other hand, in amplitude BCFD methods, best results are obtained when the delay is increased. For low delays, the timestamp that amplitude BCFD methods generate is related with the maximum of the original pulse. This is a non-linear zone of the pulse, and that introduces an error in the determination of the zero crossing point. When the delay is increased, the timestamp is associated with the linear zones of the incoming pulses, so the timestamp determination errors are reduced.
Interpolation gets equivalent (but a little worse) results in points coincident with non-interpolated configurations (as can be observed in Fig. 9 ). However, it has an important advantage: interpolated methods can be configured in points that are not available in non-interpolated methods. For this reason, there are configurations with good time resolutions that would not be available if interpolation were not applied.
In this work, polyphase interpolator filters have been used. This structure implements the digital signal processing required for the interpolation at low frequencies and without an increase in FPGA occupation area when compared with the traditional structure for the same interpolation.
The best time resolution achieved in the present work, compared with the worst result, will produce a reduction in noise variance and an important increment in NECR [1] .
This work has evaluated the influence on time resolution of different interpolation factors and different delays used to generate the bipolar signal. It has also proposed the use of a digital calculation of the charge to obtain the timestamp. However, other parameters could also be important. For this reason, the influence of different factors such as the amplitude A (6), or the analog shaping that is applied to the last dynode signal, should be considered in future works.
