Abstract -There is considerable interest to develop new time-of-flight detectors using micro-channel-plate photomultiplier tubes (MCP-PMTs). The question we pose in this paper is whether available waveform digitizer ASICs, such as the WaveCatcher or TARGET, operating with a sampling rate of 2-3 GSa/s, can compete with 1GHz BW CFD/TDC/ADC electronics. We have performed a series of measurements with these waveform digitizers connected to MCP-PMTs operating at low gain and with a signal equivalent to ~40 photoelectrons. These tests were performed using a laser diode to illuminate the photodetectors under conditions comparable to those used in previous SLAC and Fermilab beam tests. Our measurement results indicate that one can achieve similar timing resolution with both methods. Although commercial CFD-based electronics are readily available and perform very well, they are impractical for large scale systems. In contrast, ASIC-based waveform recording electronics are well-suited to such applications, and do not require analog delay lines that otherwise make CFDs difficult to incorporate in ASIC designs.
INTRODUCTION
The opportunity to exploit fast MCP-PMTs with 10μm pores and a 1cm-thick quartz radiator to produce Cherenkov light (see Fig.1 ) to achieve high resolution TOF counters motivates the selection of readout electronics. The traditional method is to use constant-fraction-discriminators (CFDs), coupled to high resolution Time to Digital Converters (TDCs), and Analog to Digital Converters (ADCs) to correct residual amplitude-dependent timing shifts ("time walk"), which the CFD does not entirely remove [1] . Using this standard technique we have achieved a timing resolution of ~14ps per counter in both test beam and laser bench tests [1] (see also Fig.2 ). The MCP-PMTs were operated at a low gain of 2-3x10 4 in these tests, as the aim was not to be sensitive to single photons, which are a dominant background in high luminosity e -e + collider detectors, ideally making the detectors sensitive only to charged particle tracks. This is a crucial point since MCP-PMT aging and high rate issues become less severe at lower gain. However one has to offset this handicap by using a thicker radiator to produce more photoelectrons.
This work supported by the Department of Energy, contract DEAC02-76SF00515, and DOE Advanced Detector Research award DE-FG02-08ER41571, and by the French P2I research consortium (Physics of the Two Infinites) on the theme called "miscellaneous highly specialized programs" under the name "Development of analog multi-GigaHertz acquisition systems".
Fig. 1 A side view of the prototype setup, consisting of two nearly identical
Burle/Photonis MCP-PMTs with 10μm MCP pores. This detector setup was used for the timing measurement reported in this paper and is the same as used in a Fermilab beam test [1] . In the test beam we used a 1cm-thick quartz radiator with Al-coated cylindrical sides, which yields a number of photoelectrons, Npe ~35±5, producing a total charge of Ntotal > 6-8x10 5 avalanche electrons, which is the necessary minimum to obtain good timing resolution. 1 In order to permit safe operation of the MCP by mitigating aging effects in a high rate environment, one wants to keep the total avalanche charge as low as possible. At the same time, the signal amplitude must still be sufficient to obtain good timing resolution. A quartz radiator produces a prompt 1 Expected timing resolution with a threshold type of electronics is: t = noise/(dS/dt)thresh ~ tr/(S/N), where noise is the rms noise, (dS/dt)thresh is the derivative of signal evaluated at the threshold, tr is the pulse rise-time and S/N is the signal-to-noise ratio.. Therefore the rise-time tr and S/N are crucial variables to realize good timing resolution. One can lower the gain only if the noise is correspondingly smaller.
Cherenkov light signal, which is an essential ingredient for fast and precise timing resolution. As is shown in Fig.1 , we also inject a fast light pulse from a PiLas laser 2 into the MCP through the quartz radiator, and determine the timing resolution under this condition. In the laser tests we could vary the number of photoelectrons, as shown in Fig.2 , by adding Mylar attenuators. For tests of the waveform digitizing electronics we have selected Npe ~40, as this approximately matches the number of photo-electrons measured in the Fermilab test beam [1] .
Recently high-resolution timing measurements based on waveform digitizers, utilizing analog memories, have been considered. In this paper we evaluate whether the 2.5 GSa/s TeV Array Readout GSa/s Electronics with Event Trigger (TARGET) [2] [3] [4] Figure 3 shows two possible applications for the results presented in this paper for a so-called "pixilated" TOF detector. 4 In the first configuration (a), the Cherenkov radiator consists of quartz cubes, each optically isolated by an Aluminum reflective coating applied to their sides. In (b) a stepped face PMT window achieves the same radiator thickness, though will suffer from worse timing resolution near the PMT edges.
Such pixilated TOF detectors have been proposed for use as particle identification detectors in the SuperB endcap [6] , and therefore we have chosen the particular operating configuration described above for the comparison. It should be stressed, that the choice of some other operating point may require a reoptimization study, as pulse shapes may change, particularly for gains suitable to single photoelectron detection. We have chosen this example because we had good existing data from test beam at Fermilab, as well as extensive and high-quality reference laser measurements with the CFD/TAC/ADC electronics. Figure 4 summarizes the results from a 120 GeV/c proton beam at Fermilab [1] , and the details are provided in this reference. Figure 4a shows the results for all events without any ADC cut or CFD time-walk correction. Figure 4b shows the final resolution of single_detector ~14ps per counter, corresponding to tighter cuts on the MCP-PMT pulse heights. An important point is that for each event the same particle passed through both counters in the test beam. The electronics for this test is shown in Fig.5a . Beam test results from the SLAC and Fermilab tests are included in to Fig.2 . The laser tests used an 80:10:10 fiber splitter 5 to inject the light signal into the two detectors at the same time. The single detector resolution is obtained by dividing the measured resolution by 2. The laser diode produced a 1 mm beam spot on the MCP face. Figure 2 illustrates the measured timing resolution as a function of the number of 5 Fiber splitter was made by Global Opticom Inc., P/N MP63AV0103DL333. 7 , if we assume that the transit time spread (the resolution for a single photoelectron) is TTS(extrapolated to Npe = 1) ~120 ps. Such a large TTS value is consistent with our choice of low gain operation, in order to be linear for signals of up to Npe ~30-50, where we measure Single_detector ~20 ps, as seen in Fig.2 . two_detectors/ 2 ~3.42ps. A calibration of ADC114/TAC588 time scale was done with a special pulser 8 giving 3.1ps/count [1] . 6 Light from the laser diode was attenuated using Mylar sheets and the Npe was determined by several methods: (a) scope, (b) ADC measurement, and (c) statistical arguments. 
TIMING METHODS

A. Beam test and laser test results with CFD/TDC/ADC
B. The laser test setup for the waveform analysis
The laser test bench setup for the waveform digitizing electronics uses two Hamamatsu C-5594-44 1.5 GHz BW amplifiers with a gain of 63, coupled to each detector via 6-inch long SMA cables. Figure 6 illustrates this configuration. The laser diode operates at a wavelength of 407 nm, and the light is distributed by an 80:10:10 fiber splitter to insert the light signals into the two detectors at the same time. 
C. Timing results with the WaveCatcher ASIC board
The USB WaveCatcher board is well-suited to the acquisition of fast analog signals over a short time window. The current version is based on the SAM chip [7] . This ASIC, designed in the AMS 0.35 m CMOS process, integrates two channels of ultra fast differential analogue memories of 256 cells each, arranged in a matrix structure and based on a CEA/IRFU and IN2P3/LAL common patent [8] . The chip performs the sampling of analogue signals at a rate of up to 3.2GSa/s, defined by an internally servo-controlled delay line loop. These samples are stored in an array of capacitors that can be fully or partially read back and digitized by an external ADC operating at a moderate conversion frequency (10MHz).
The WaveCatcher board, measuring 149x77 mm, communicates via USB 2.0 and is sufficiently low power (<2.5W) that is can be powered solely by the USB bus.
Waveform sampling is performed with a depth of 256 points on 2 DC-coupled analog channels. The analog bandwidth exceeds 500MHz with over 12 bits of resolution at a sampling frequency (Fs) that can be configured as 400, 800, 1600 and 3200 MSa/s (3.2 GSa/s). This corresponds to a sampling window ranging from 80ns at 3.2GSa/s up to 640ns at 400MSa/s. Each channel also contains a pulse generator for reflectometry measurements, as well as the capability to perform signal integration, such as direct measurement of a PMT signal charge. In this latter operating mode, the sustainable trigger rate can increase to a few tens of kHz.
The analog input range can be individually offset using 16-bit DACs over the full ±1.25V dynamic range, thus optimizing the SNR for a given signal shape.
The trigger signals can be internally generated using individual discriminators on each channel and with thresholds set by a 16-bit DAC. Internal random triggers, a software trigger, or an external trigger may be used. These board triggers can also be broadcast externally through an LVCMOS trigger output, simplifying external trigger synchronization. In addition, trigger rate scalers are provided, permitting trigger rate monitoring independent of event readout.
The board can also be used as a TDC for high precision time measurement between two signals. The timing signals can be present either on the same input channel, or on two
different channels, with the constraint that their separation must be smaller than 16 clock periods (one clock period = 5ns@3.2GSa/s up to 40ns@400MS/s). The measured sampling time precision is better than 10 ps rms at 3.2GSa/s.
While power is normally provided from USB, the board can also be powered by a +5V external supply through a standard 2.1mm jack plug. Although the default connectors are BNC, either SMA or LEMO connectors can be alternatively installed. Fig.7a shows a photograph of the WaveCatcher board used in these tests. Waveforms were acquired at a sampling rate of 312.5ps/bin and an analog BW of 500MHz. Fig.7b demonstrates the "oscilloscope-like" software interface developed for this waveform digitizer and used to setup this measurement. In order to obtain the best timing measurements, it is necessary to correct for integral non-linearity (INL) in the actual sample acquisition time into the analog memory. Such corrections are typically necessary with these types of digitizers. The effect is illustrated in Fig.8a , where the sampling points (blue points) that should be equidistant are not, and therefore the real signal (black) may be distorted into the "fake" one (dashed blue).
To calibrate out this INL a method using a precise sine wave generator 9 has been developed. This technique uses a well-chosen sine wave signal (135 MHz, 500mV rms as shown on Fig.8b ) as a source of calibration of the segments crossing the mid scale of the dynamic range. These segments are assumed to be straight lines. Using sufficient statistics, the mean length of these segments directly determines the differential nonlinearity (DNL) in time, whereas the jitter on their length estimates the jitter on the measurement time.
Integrating the DNL and rescaling it with the clock period gives the Integral Non Linearity in time (time INL). This INL correction, which is very long-term stable, can be used to 9 High precision 8656B HP gate sine wave generator, 0. correct the position of the samples of an event to their actual location in time in two different ways. First, either recreating equidistant samples (green crosses on Fig.8a ) using a second order Lagrange polynomial interpolation, for instance if the signal has to be displayed on screen or used in an FFT. Or second, simply using the real time position of a few points (red points on Fig.8a ) needed for an ongoing measurement, like a precise CFD time measurement as described below).
The effect of this correction is very significant. For the WaveCatcher board, the DNL distribution was 7.5ps rms before the calibration, and 0.33ps rms after. Similarly, the INL distribution was 16.9ps rms before the calibration, and 1.15ps rms after. The laser was adjusted to 100Hz frequency for these tests (see Fig.6 ). The MCP-PMT voltages were set to 2.21 and 2.1kV to operate at a gain of 2-3x10 4 , and the laser intensity adjusted to give a net charge similar to that as in the Fermilab test [1] . The WaveCatcher took data with a nominal sampling interval of 312.5ps/bin. The first analysis step was to perform a spline interpolation of the waveform, which worked with either 1ps or 10ps time bins (in the end it was determined that 10ps binning is sufficient). Figure 9 shows MCP-PMT pulses recorded in two ways: as measured by the WaveCatcher board with a spline fit and a 10ps interpolation step, and by a 1GHz BW digital oscilloscope. Two timing methods were employed. The first one, shown on Fig.10a , is a software CFD method, which consists of normalizing the pulses to the same peak amplitude and using a constant-fraction threshold, usually set to 18-22% of the peak amplitude. And second, a so-called a reference timing method, in which one determines first a reference pulse shape (see Fig.10b ). The pulse time is then determined by stepping through a chosen reference pulse, and calculating a 2 using a certain number of time bins 10 . This choice of the window comparison needs to be tuned to obtain the best performance. One can use, for example, a second order polynomial to fit only the leading edge of the average pulse profile for normalized pulses (see Fig.11 ). waveforms are normalized to a common peak amplitude and a threshold of 22% of this peak amplitude is used to determine the timing. (b) Average 10 Although we call the method a 2 method, it is closer to a least square method with error assignments equal to 1.0. To find the optimum bin in 1ps steps, we find a minimum of this formula: {Factor*[time_spline_fctn[k+j]-(p0+p1*float(j)+p2*float(j)*float(j))]} 2 , where time_spline_fctn[k+j] is the spline interpolation through digitized data, Factor is arbitrary normalization constant, and p0, p1 and p2 are the polynomial constants representing the signal leading edge, j is the loop index over the leading pulse edge, and k is an overall offset index. pulse shape used for the reference timing 2 algorithm (black is average, red shows ± 2 contour).
Fig. 11
A reference pulse for the 2 timing method is formed from a second order polynomial fit to the leading edge of the average pulse shape profile. The fit is used as the reference pulse (template) in the 2 timing determination of Fig.13a . 
(a) (b)
Fig. 14 Reference pulses for the 2 timing method, formed from: (a) a third order polynomial fit to the pulse peak of the average pulse shape profile; where the fit is used as a reference pulse for the 2 timing in Fig.15a , (b) a second order polynomial fit to the very beginning of the leading edge of the average pulse shape profile; where the fit is used as a reference pulse for the 2 timing in Fig.15b .
It is not a priori obvious which portion of the pulse carries the most important information for precision timing determination. What matters is not only the S/N ratio of each sample (see footnote 12), but also fluctuations in the MCP amplification process. Since we do not have a reliable MCP MC simulation program at present, we have decided to explore this question empirically. Figure 14 shows two additional methods to create reference pulses to apply in the 2 method: (a) peak region only and (b) the very beginning of the leading edge only. In each case the optimum number of time bins to be used in the 2 calculation had to be retuned. Figure  15 shows the result of an analysis applied to the same data set, where it is seen that the timing determination using the very beginning of the leading edge is slightly more accurate. The time resolution is almost ~0.9ps better than if we use the peak region only, and ~0.4ps better than if we use the entire leading edge. All of these 2 method results are better than the CFD timing algorithm, which yielded a resolution of ~16.2 ps (see Fig.13b ). determined with a 2 algorithm employing reference pulses made using (a) the peak region (see Fig,14a ); (b) the very first portion of the leading edge (see Fig.14b ). Although it is appropriate to investigate various methods of timing determination at the R&D stage, in a final application one has to worry about the speed of the algorithm. From this point of view we believe that the CFD-based software algorithm is a very good candidate for future large-scale applications using waveform sampling electronics, as it is much faster than the 2 -method. However, even the CFDbased algorithm has to be optimized for speed, while at the same time preserving the timing performance. All results presented so far have used an interpolation step size of 10ps. In Fig.16 we vary the spline interpolation period from 1ps to 312ps (312ps means no spline interpolation at all), while keeping the CFD algorithm the same. This is equivalent to simulated operation at different sampling frequencies. We can see that for an interpolation period between 1 and 100ps the time resolution is essentially unchanged. For 150ps the increase is very small and at the last point, without spline interpolation (312.5ps), the time resolution remains excellent: single_detector ~18.1 ps. From this we conclude that applying this very simple algorithm, which is easy to integrate inside an FPGA (finding a maximum & linear interpolation between two samples, i.e., without a use of the spline fit), already provides almost ideal results (only 10% worse than the best possible resolution limit. Moreover, we note that there would be essentially no loss if the chip were able to sample at a 6GSa/s rate). . The ideal method, applied on data fully corrected for timing INL, as described previously, gives a single_counter ~17.2ps for a CFD fraction in the range of 0.2 to 0.3. If the INL correction is not applied, the resolution worsens to 27.3ps. A simpler algorithm, easily adapted to an FPGA firmware implementation, gives a result only ~8.5% worse than the complex method.
Data taken over a period of three months and using the same timing calibrations gives comparable timing resolution. This validates the long-term stability of the timing INL pattern mentioned earlier. Fig.18a shows a TARGET ASIC evaluation board used in these tests. Like the WaveCatcher board described previously, it is compact, low power (typically 1.8W total during normal operation), and is operated via a USB-2 interface. Figure 18b shows a block diagram of the TARGET ASIC, together with a companion FPGA that provides state machine and timing control signals. The TARGET chip used in this test was run at a sampling rate of ~450ps/bin. The input amplifier and sampling configuration was run in a mode where the analog BW into the storage array was approximately 150 MHz. After an on-chip terminator, the analog signal is buffered and copied to the matrix of 8 storage rows of 512 samples for each of the 16 input channels. Each of the rows may be independently addressed to initiate a storage cycle. Within each switched capacitor array (SCA) storage cell is a capacitor and a comparator. Conversion of these stored samples is done using a Wilkinson ADC method, where the stored voltage is converted into a transition time of the in-cell comparator due to an applied voltage ramp. Encoding is performed by measuring the time interval between the start of the ramp and the comparator output transition. In a simple form of time-todigital conversion, this interval is measured by counting the number of high-speed clock cycles taken [4, 5] . Figure 18c shows the "oscilloscope-like" software interface, developed using the wxWidget tool kit, used with the TARGET waveform digitizer to setup this measurement.
D. Timing with the TARGET ASIC chip
11 Algorithm: (a) For each sample, the data are first corrected for timing INL by associating a corrected time Tc(j) to each sample j thanks to a lookup table; (b) after a pedestal subtraction, find the pulse and its amplitude M, (c) determine the two samples with index j and j+1 between which the waveform crosses the M*F level (F is the fraction), (d) approximate the waveform by a straight line. The timing is then given by: Time = Tc(j)+ (Tc(j+1)-Tc(j))*M*F/(V(j+1)-V(j)). Using this method we determine a single counter resolution of 18.6ps rms (for F= 0.23). 
Fig. 20
Illustration of the normalized pulses that were used for a CFDbased timing algorithm. Pulse peaks were determined from a parabolic fit to the peak region of the pulse.
The laser was run at 10Hz in this test. High voltage on the MCP-PMTs was set to 2.2 and 2.1kV, respectively, and the laser intensity adjusted to give a signal charge similar to that in the Fermilab test [1] . As with the earlier WaveCatcher analysis, the first step is to perform a spline interpolation with 10ps-bins. Fig.19 shows the MCP-PMT pulses, as measured by the TARGET chip and an oscilloscope, under conditions representative of those used in the Fermilab beam test [1] .
Once again, two timing determination methods were employed. The first is a software CFD method, which consists of normalizing the pulses to the same peak and using
(b) (a) (c)
a constant-fraction threshold, usually set to 18-22% of the peak. We used a parabolic fit to the small region near the pulse peak to determine the peak amplitude, which was then used for the normalization. Figure 20 shows the normalized pulses, used in the CFD algorithm. Also as before, the second method uses so-called reference pulse timing. In this method one finds first a reference pulse shape -see Fig.21 for an example of such fits. The reference pulse is then stepped through a given normalized pulse, and one calculates a 2 using a number of bins, the optimal number of which needs to be tuned. We found this optimum number of samples to be 40-60 of the 10ps-wide bins, which corresponds to a time interval equivalent to the length of the pulse leading edge. Figure 22 shows 2 calculated as a function of the timing step, and a resulting time distribution corresponding to a 2 -minimum. Figure 23 shows the final result of the timing distribution between start & stop for both the CFD and 2 timing methods. Again, we quote a resolution per counter by dividing the fitted result by 2. One can see that the reference timing method gives a slightly better result. The TARGET chip is externally triggered, which means that in these laser tests, the pulse always appears in roughly the same position in the analog memory. Therefore, we did not expect the INL correction to have a large effect. Indeed our data analysis confirms this expectation.
It should be noted that the analog bandwidth and sampling speed of the TARGET ASIC, as configured in these tests, is lower than that of the WaveCatcher (see Fig.24 ). This was anticipated to contribute to a worse timing performance as compared with the WaveCatcher.
Fig. 24
Average pulse shapes of PMT signals recorded with the TARGET chip and WaveCatcher board (the same Hamamatsu amplifier was used in both tests). The faster rise time of the WaveCatcher is due to its higher front end bandwidth -see Table 1 .
We propose a simple formula 12 for evaluation of the 2 method timing resolution applied to waveform sampling. Although one has 4-6 samples on the leading edge, samples near peak have higher weight, as their S/N ratio is higher. This probably explains why the 2 method is only slightly better than the CFD method.
12 Expected timing resolution with a 2 timing method with the waveform sampling is: t = 1/N{ [ noise(i)/(dS/dt)thresh(i)]} ~ tr 1/N{ [1/(Si/Ni)]}, where N is number of samples, noise(i) is the rms noise contribution from the i-th, (dS/dt)thresh(i) is the derivative of signal evaluated at each sample i, tr is the pulse rise-time and Si/Ni is a signal-to-noise ratio evaluated at the i-th sample. Therefore the rise-time tr and S/N are crucial variables to get a good timing resolution.
CONCLUSION
We conclude that when MCP-PMTs are operated under the same conditions, timing results obtained using waveform digitizing with the WaveCatcher board are consistent with a combination of Ortec 9327CFD, TAC588, and 14bit ADC114 electronics. The TARGET chip results are worse due to (a) lower bandwidth and (b) a worse S/N ratio (see Table 1 ). We also conclude that a spline fit-based CFD method yields a worse result than all reference pulses tried with the 2 timing method. Among various portions of a pulse tried, the best 2 timing method resolution was obtained using the very beginning of the leading edge of the pulse. The CFD-based software algorithm is an excellent candidate for future largescale applications as it is much faster than the 2 -method. Evaluating a simple and fast algorithm that does not require a spline fit, which would be suitable for a real-time processing implementation in an FPGA, we determine that finding a maximum and using linear interpolation between two leading edge samples already gives very good results (within ~8% of the best result obtained).
The fact that we found waveform digitizing electronics capable of measuring timing resolutions similar to that of the best commercially available Ortec CDF/TAC/ADC electronics is, we believe, a very significant result. It will help to advance the TOF technique in the future, particularly for large-scale systems.
In summary we should note that similar conclusions about the exquisite timing possible with waveform digitizing techniques was shown in Ref. 9 , where the authors compared simulations with measurements using an 18GHz BW oscilloscope operating at 40GSa/s sampling. Note: * The noise is a baseline noise measured before the pulse. Signal is defined as the average of the signal peak. ** Large cross-talk is due to the inductive coupling in wire bonds.
