ABSTRACT: A 64-channel ASIC for Time-of-Flight Positron Emission Tomography (TOF PET) imaging has been designed and simulated. The circuit is optimized for the readout of signals produced by the scintillation of a L(Y)SO crystal optically coupled to a silicon photomultiplier (SiPM). Developed in the framework of the EndoTOFPET-US collaboration [1], the ASIC is integrated in the external PET plate and performs timing, digitization and data transmission for 511 keV and lower-energy events due to Compton scattering.
Introduction
The advent of highly compact solid-state photodetectors and new scintillator materials paves the way for the development of compact Positron Emission Tomography (PET) systems with higher sensitivity and spatial resolution.
Such photodetectors, often called multi-pixel photon counters (MPPCs) or silicon photomultipliers (SiPMs), have high gain and are therefore sensitive to single photon hits. The fast rising edge of the output current signal indicates that they may be suitable to extract the time-offlight (TOF) information of the symmetrical 511 keV photons emitted by the positron annihilation. This measurement can identify, with an error ∆x, the position of the annihilation along the chord that defines the travel path of the back-to-back photons between the two detector pixels (refer to figure 1) [2, 5] . While not increasing the spatial resolution, this information improves significantly the background rejection of a PET system. Consequently, the reduction of the statistical noise caused by the random coincidences impacts positively the signal-to-noise ratio of the reconstructed image.
On the other hand, scintillation statistics add a timing uncertainty due to the photon transit time that can set a lower bound to the achievable timing resolution. The scintillation light created by the γ-interaction is emitted in all directions so there is a part of the photon flux that travels directly towards the interface with the sensor, while another fraction of it is reflected. It has been shown that, considering the direct (forward) and longest (backwards) photon paths, the contribution of the photon transit time to the CRT 1 in a 20 mm long LYSO crystal can be as high as 230 ps. [6] It is thus essential to extract the time stamp of the first photoelectrons (p.e.) [4] , which requires low-noise electronics able to set a trigger as low as 0.5 p.e..
The timing uncertainty due to the charge carrier transit time of the silicon detector plays also an important role. Thereby, the choice of the threshold for the time trigger may also be a function Figure 1 . The EndoTOFPET-US aims a 200 ps coincidence resolving time (CRT), corresponding to a ∆x 30 mm of spatial resolution along the line-of-response (LOR) of a dual-head PET detector. By confining the annihilation to a segment of the chord, the statistical noise introduced by the activity of other voxels aligned in the same LOR can be eliminated, as long as these other voxels are distant by at least ∆x. The position error, given by ∆x = CRT · c/2, corresponds to the diameter of a volume that defines indicated the region-of-interest (ROI). Far from what is possible with state-of-the art technology, a timing resolution of 10 ps would indeed lower the position uncertainty to the range of 2 mm. This would effectively improve the spatial resolution of a PET scanner, as a direct mapping of the annihilation coordinate along the LOR would be possible.
of the overvoltage applied. The authors on [3] claim to achieve better CRT by setting a threshold equivalent to 3 p.e..
These specifications require the front-end electronics to be fast, low-noise, and to cope with the expected high dark count rate of the SiPMs.
Moreover, in view of the compactness of the PET system envisaged and its inherent low power budget, the front-end ASIC is required to integrate a high number of readout channels with a limited power consumption.
This article covers the design concepts and implementation, discusses the architecture and simulation results of a 64-channel ASIC for TOF-PET.
Readout channel architecture
The TOFPET ASIC readout channel is composed of an analogue front-end that amplifies the input signal and delivers two digital signals to a mixed-mode TDC, which output is a data set containing information on the time of the trigger and the time-over-threshold of the processed input signal.
Time information is extracted applying a single threshold to the leading edge of a fast signal replica. To overcome the fact that this technique is susceptible to variations of the trigger time with the amplitude of the pulse [8] , we anticipate that the charge information can be used to correct offline the timing degradation, if any, due to time-walk. Figure 2 . Overview of the channel architecture. The use of two independent signal paths for independent timing and energy measurements was proposed in the past by other authors (e.g. [9] ). The analogue input to the energy branch discriminator may be a filtered version of the amplified signal.
On the other hand, the time jitter due to the scintillator statistics is overcome by setting this threshold as low as possible (down to 0.5 p.e.). We refer to it as V th T , applied to the timing discriminator and generating a trigger pulse called DOT .
A second discriminator is set with an higher threshold V th E . The energy discriminator has two purposes: validation of events (i.e. dark count rejection) and provision of a second time stamp used for time-over-threshold measurement. If DOE goes high, the channel logic considers the hit as valid and its falling edge is polled.
The channel logic is embedded in a mixed-signal dual time-to-digital converter (TDC), which provides a 50 ps resolution time stamp for both the rising edge of the Timing Discriminator output (DOT ) and the falling edge of the Energy Discriminator Output (DOE). Figure 2 sketches the structure of the described readout chain. For every energy event, these data are saved into a buffer and a valid hit flag is issued. It will further be collected by the chip global controller in a round-robin scheme, multiplexed, packed into frames and output with a differential serial link.
Front-end circuit
The channel front-end includes a pre-amplifier acting as a current conveyor, two transimpedance amplifier branches and two independent voltage-mode discriminators. The first stage is a regulatedgate cascode (RGC) that conveys the signal from a low input impedance node into a high impedance output. If the feedback loop is a common-source circuit, then the MOS transistor amplifier becomes the dominant noise contributor [7] . The advantage is that the input impedance can now be trimmed without affecting the noise performance of the front-end.
Implementing the RGC input stage with a differential amplifier (as shown in figure 3 ) cuts the stage bandwidth with respect to a common-source feedback, but it allows an easy adjustment of the input node DC bias (6-bit, 500mV range) for fine SiPM gain trimming. Again, this adjustment is uncorrelated with the total noise at the output. In such implementation, the main contributors to the total output noise voltage are the differential pair input transistors.
Due to the stray capacitance of the SiPM (typically C d = 30pF/mm 2 of active area), the input resistance R in of the front-end is required to be low. More than acting on the bandwidth (lowering R in shifts the dominant pole R in C d towards higher frequencies), one is interested on matching the line impedance. The input resistance of the front-end can be adjusted between 10 and 60 Ω.
Thanks to the use of a differential closed-loop input stage, this variation is independent both of the noise performance and the input node DC voltage.
A nominal value of 1.5 mW dissipated by the regulation loop amplifier is needed for triggering on the first p.e. with a SNR of 23.5 dB (conditions for a 320 pF terminal capacitance device -3x3 mm active area SiPM). In these conditions, the total rms noise voltage at the input of the timing discriminator is less than 3 mV. On the other hand, if higher thresholds for the time trigger are allowed (2-5 p.e.), then the SNR goes up: allowing higher levels of total rms output noise voltage means that the power consumption of the front-end, which is mostly used to mitigate the thermal noise of the regulation loop devices , can be reduced.
The AC coupling concept is quite simple, consisting of a metal-over-metal capacitor and a back-to-back configuration of two MOSFETs in cutoff region (the sub-threshold current imposes a high-impedance node) to set the baseline of the post-amplifier. Process corner and temperature simulations show a pile-up no larger than 20 mV for 40 kHz (Q in = 300 pC) event rate, which is tolerable considering that a minimum threshold of 75 mV corresponds to single photon triggering.
Two independent amplifier branches generate V out T and V out E , sampled by two voltage comparators which thresholds are set by 6-bit DACs with a configurable range and LSB. Each channel includes therefore three independent current-mode DACs (two for fixing the thresholds, one for the DC bias of the input node), set by the local channel configuration register.
A selectable shaping function can be applied to the energy branch. Integrating V out E can prevent re-triggering the TDC control and correct eventual loss of monotonicity of the ToT vs. Input charge characteristic curve.
The expected ToT curve is non-linear and an off-system calibration using discrete radiation sources is needed. After that, an internal calibration circuitry is used for evaluation of the channel-by-channel matching and for on-system monitoring in case an eventual degradation of the energy resolution is observed. The calibration circuit consists of a top-level current generator that injects a controllable amplitude voltage step into a 180 pF capacitor, which is spread amongst the 64 channels. A suitable choice of the RC zero of the differentiator creates a fast current pulse with 40ns decay time. The dynamic range achieved with such scheme is roughly 200 pC. Figure 4 plots the simulation results of the ToT obtained both with the internal calibration circuit and an ideal large signal model of the LYSO+SiPM.
Time-to-digital converter
The mixed-mode Time-to-Digital Converter block is built up of two analogue TDCs, a channel logic control and data registers. The input to this block is a set of two trigger signals, outputs of the timing (DOT ) and energy (DOE) discriminators. Figure 5 .a) illustrates the principle of operation. For each energy hit, two time stamps are derived (t 0 and t 2 ) with a 50 ps resolution. From these we obtain both a precise timing and the time-over-threshold information.
The core of the circuit is the channel logic control and a pair of analogue multi-buffered TDCs based on time interpolation. Each measurement uses the value of a 10-bit global counter (distributed in Gray-code to each channel) latched synchronously, and the phase of the trigger in respect to the global master clock. Note that, if the global counter is clocked by a 160 MHz clock, then the time resolution achievable just by using the coarse time stamp would be T CLK = 6.25ns. The absolute time stamp is obtained by concatenating this 10-bit coarse time stamp with an 8-bit fine measurement.
While the coarse time is saved by latching locally the value of the global counter, the fine timing is obtained by an analogue TDC. This fine time stamp is a direct measure of the phase of -5 - Figure 5 . (a) Dual-threshold scheme: a low-threshold trigger (few p.e.s) tags t 0 , providing also the first time measurement for the ToT calculation. The falling edge of the higher threshold discriminator sets t 2 , from which the ToT can be derived. This higher threshold (V th E ) is also used for dark count rejection; the indicated "hit validation" flag is used by the channel logic control to discard low-energy events. (b) Simulation result of a 511 keV γ hit: the input vector is obtained with GAMOS/c++. The state of DOE (output of the energy discriminator) is polled and the coarse+fine time measurements of t 0 and t 2 are saved locally if the hit is valid. the asynchronous pulse with respect to the master clock, using a set of time-to-analogue converters (TAC) and an ADC. An 8-TAC interpolator for precise time measurements has been described in [10] . In [11] and [12] , a sub-micron implementation of a time interpolation multi-buffered TDC with 1 mW power consumption is reported.
The time to analogue conversion works as follows. For each write command, a capacitor is charged with a constant current until a known clock phase is reached. Depending on the parity of the global counter, this phase has an overhead of 1 or 2 full clock cycles. This forces a charging cycle that is at least one clock period, in order to avoid hard to correct measurement offsets due to very short charging cycles. The duration of the charging cycle is controlled by the wtac T and wtac E commands, as shown in figure 5 .b).
Transferring this charge into a 4x bigger capacitor and discharging it with a 32x smaller current yields a time multiplication factor of 128. Working at 160 MHz, this mechanism provides therefore a 50 ps time binning measurement. This discharge is herein called conversion, and the voltage signal is monitored by a latched comparator. The value of the global 10-bit counter is latched at the start and at the end of the conversion. Since a multi-buffer approach is used to de-randomize the input event rate, the inherently high conversion time of these circuits is masked. Figure 6 . Simulation results: outputs of front-end and channel logic. O wtac T and O wtac E (top) are write commands generated by the TDC control for the fine measurements of the time (low-Vth) and energy (high-Vth) triggers; Vout T , V th T are the input and threshold signals of the voltage discriminators of the timing branch -the equivalent signals for the energy branch are plotted with the suffix E. DOT int and DOE int refer to the output of the discriminators. The Vout E is not shaped -the de-excitation of the crystal may cause the energy discriminator to re-trigger (as seen in the figure). Although it can be managed by the TDC control, applying a shaping function to Vout E reduces the probability of these spurious triggers.
For every valid measurement, a set of 5 10-bit words and the ID of the TAC used are written into a local data buffer. These words are the values of the global counter at t 0 , t 2 , start of conversion (SOC) (both TDCs start simultaneously) and end of conversion (EOC) of each one of the two. On-chip processing reduces this information into a 40-bit stream, to which the channel ID (6-bit) is added: t 2 is converted into a 6-bit offset in respect to t 1 , and the arithmetic subtraction of EOC and SOC results in the 8-bit fine time. The 6.4 µs (2 10 clock cycles) range of the counter allows the backend to build up a monotonously increasing time stamp. The channel logic controls both the buffer assignment and analogue switching, as well as the data registers and interface with the global chip controller. Figure 6 plots the front-end amplifier outputs and the write commands issued by the TDC control for a sequence of dark pulses and a 511 keV event. If the deposited charge is higher than the defined energy threshold, two write commands are issued for the fine time measurement of t 0 and t 2 , respectively wtac T and wtac E. Since the system must allow triggering on the first p.e., it is also sensitive to the charge of a pixel dark pulse. That also means that it must be able to reject the first time measurements of these spurious events, with little or no impact on the channel dead time. 2 
64-channel TOFPET ASIC
The TOFPET ASIC consists of a 64-channel analogue block, calibration circuitry, golden-reference and bias generators and a global controller. The global biasing of the TOFPET chip is guaranteed by global reference current generators and a set of 14 6-bit binary weighted DACs. A subset of these cells is used for the biasing of the front-end and is disposed alongside with channel CH0 (right edge in figure 7) . The references for the TDC and the calibration circuitry are placed next to channel CH63. Two golden-reference generators, which input is served by dedicated pads, allow further adjustment of the dynamic range of the bias cells and the calibration circuitry. The chip powers-up in safe mode, i.e., loading default global and channel configuration words that avoid a noisy or unstable start-up.
The chip is usable with p-type or n-type inputs (hole/electron collection devices) and with higher light yield crystals (coarse gain of the TIA can be down-stepped). The input impedance of the front-end is adjustable (typical range between 10-60 Ω).
Data is serialized and output through a LVDS interface, with 8B/10B encoding and a bandwidth of 160 -640 Mbit/s. Data transmission uses TX training or an output clock (CLK out) for synchronization with the front-end board FPGA. An additional 10 MHz SPI like configuration interface allows to mask screamer channels (e.g. due to noisy SiPMs), write/read channels' settings (buffer current, channel enable, gain, thresholds, etc) and global settings (coarse buffer current, TAC refresh rate, TX training mode, etc), generate test sequences and TAC calibration sequences, and read channels' dark counts and trigger errors.
Internally, the global controller distributes to each channel a complete and independent set of signals: clock, reset, global 10-bit counter, configuration vector, etc.
Besides the synchronous test-mode of the TDC (for calibration), a test pulse IO is available for linearity and other asynchronous testing.
The 64-channel TOFPET is implemented in a 8-metal standard CMOS 130nm technology. It is wire-bonded and has one edge free of pads, such that two abutting chips can be packaged into a 128-channel IC.
A 17x17 mm FBGA package was chosen as the baseline solution for the 128-channel casing, decreasing the signal input line stray inductance and allowing a compact assembly of front-end readout modules.
-8 -
JINST 8 C02050
6 Conclusion and outlook A 64-channel chip for the readout of silicon photomultipliers coupled to fast scintillators was designed. The ASIC is developed for TOF-PET, but it can be suitable to be used in other applications (e.g. astrophysics). For the targeted application, the ASIC is expected to integrate a system with 200 ps timing resolution. Time and energy measurements (using time-over-threshold) are performed with a low-power mixed-mode TDC based on time interpolation, with a time binning of 50 ps. The analogue front-end includes a closed-loop low noise, low input impedance amplifier, which allows setting on-chip the bias voltage of the SiPM within a range of 500 mV. Silicon test results are expected during the first quarter of 2013.
