The CLARO-CMOS is an application specific integrated circuit (ASIC) designed for fast photon counting with pixellated photodetectors such as multi-anode photomultiplier tubes (Ma-PMT), micro-channel plates (MCP), and silicon photomultipliers (SiPM). The first prototype has four channels, each with a charge sensitive amplifier with settable gain and a discriminator with settable threshold, providing fast hit information for each channel independently. The design was realized in a long-established, stable and inexpensive 0.35 µm CMOS technology, and provides outstanding performance in terms of speed and power dissipation. The prototype consumes less than 1 mW per channel at low rate, and less than 2 mW at an event rate of 10 MHz per channel. The recovery time after each pulse is less than 25 ns for input signals within a factor of 10 above threshold. Input referred RMS noise is about 7.7 ke − (1.2 fC) with an input capacitance of 3.3 pF. Thanks to the low noise and high speed, a timing resolution down to 10 ps RMS was measured for typical photomultiplier signals of a few million electrons, corresponding to the single photon response for these detectors.
Introduction
The fast counting of photons down to the single photon level is a basic requirement shared among several applications, ranging from particle identification in fundamental physics to imaging of biological processes in nuclear medicine. In many cases the applications require pixellated photodetectors with pixel size of the order of a few squared millimeters, often placed side by side to increase the total photosensitive area. The total number of pixels can be very large, ranging up to the order of 10 5 . The case of ring imaging Cherenkov (RICH) detectors is one of the most demanding. In this case, the photodetectors are usually arranged to form planes of up to a few squared meters, ideally with no dead space between pixels.
Among the photodetectors which may be employed, multi-anode photomultiplier tubes (Ma-PMT), thanks to the negligible dark count rate, are most often the baseline for RICH detectors. New time of flight (TOF) detector designs often employ light sensors with superior time resolution, such as microchannel plates (MCP). Scintillator-based detectors usually generate a larger number of photons per event, and can thus take advantage of light detectors with a higher dark count rate, but lower cost, such as silicon photomultipliers (SiPM). From the point of view of the readout electronics, the signals from these photodetectors have very similar characteristics, and the same readout circuits can be used with minor adjustments. The typical photomultiplier gain is of the order of 10 6 , and the expected pixel capacitance is of the order of a few pF. The charge collection time is small, of the order of one nanosecond. Other kind of photosensors exist which require different design solutions for the readout electronics, but are not considered here.
The main challenges in the realization of the electronic readout of such systems stem from the large number and close packing of the readout channels. This requires a low power dissipation to minimize cooling issues. Other frequent requirements are the sustainability of high count rates or the allowance of precise timing measurements. These call for wide bandwidth, which is in contrast with low power dissipation. Wide bandwidth also requires the minimization of the capacitance between pixels, which can be a major source of crosstalk. To mimimize capacitance the front end electronics must be as close as possible to the photosensors, which also helps in minimizing noise. But this poses design issues which go back to power dissipation and cooling. These trade-offs need to be tuned to the specificities of each application.
Several application specific integrated circuits (ASIC) for photodetector readout are already available, covering a wide range of applications [1, 2, 3, 4, 5, 6, 7] . For instance, an ASIC suitable for timing measurements with a resolution of 20 ps RMS is the NINO [1] , designed in IBM 0.25 µm CMOS technology, with a power consumption of 27 mW per channel. On the other side, an ASIC for fast photon counting with a lower power consumption is the MAROC [2] , designed in AMS 0.35 µm SiGe-BiCMOS technology, which consumes about 5 mW per channel but was not designed for precise timing measurements.
The technological advances driven by the field of digital electronics and of commercial portable communication devices, which also require wide bandwidth at low power, can result in significant improvements in the field of fast photodetector readout. Nevertheless careful design in a rather aged (and inexpensive) purely CMOS technology, such as the 0.35 µm from AMS, can still yield excellent results at the cutting edge of timing performance and low power. This is the aim followed in the design of the CLARO-CMOS, the first prototype of an ASIC for photodetector readout presented in this paper. Figure 1 shows a photograph the 4-channel ASIC. The die area is 2 × 2 mm 2 . The radiation hardness of the technology adopted is expected to be adequate for most accelerator and space environments [8, 9] . However the effects of radiation on the circuit performance depend also on the design and layout of a given device. The radiation hardness of the CLARO-CMOS prototype will be measured in the near future, but is not considered in this paper.
2 Design of the prototype Figure 2 shows the block diagram of a channel of the CLARO-CMOS. The ASIC is designed for operation between a positive 2.5 V supply rail and ground. The charge sensitive amplifier (CSA) converts the input current pulse into a voltage signal, which is AC coupled to a PMOS follower and to a discriminator (a voltage comparator). The threshold of the discriminator is set by the programmable static voltage at the non-inverting input of the comparator. The schematics of the charge sensitive amplifier and of the comparator will be described in detail in the following.
As will become clear later, the DC voltage at the output of the CSA is close to the positive rail and its value is not stable against temperature variations. For these reasons the AC coupling shown in figure 2 was introduced. In this way the DC voltage at the inverting input of the comparator is held at half-way between the positive rail and ground, and is independent of temperature. The AC coupling time constant is 55 ns. Since, as will be shown, the signals at the ouput of the CSA are very fast, no noticeable baseline shift is caused by the AC coupling unless the rate is larger than about 10 MHz.
The auxiliary output buffer realized with a small area PMOS follower is primarily used for debugging purposes: it allows to measure the signals at the inverting input of the discriminator without loading the output of the CSA. It needs to be biased by an external resistor tied to the positive supply voltage, and is meant to be switched off during normal operation, when the threshold is properly set and only the binary information at the output of the discriminator is readout. In all the power consumption measurements presented in the following, the analog buffer was off.
Gain and threshold are programmable thanks to a 16-bit shift register, very similar to a SPI interface. The first 8 bits control channel 1. Three bits are used to control the gain of the CSA, as will be described in the following, and the remaining five bits control the resistive divider at the noninverting input of the comparator. The second group of 8 bits controls channel 2. In this prototype, settings for channels 3 and 4 are copied from those of channels 1 and 2.
The design of the CLARO-CMOS is optimized for negative input charge signals, that is, the ASIC is designed to be used with photodetectors where electrons are collected at the readout electrode. To accomodate the case where the photodetector signals are made of holes, as for some SiPM models, the same design could be reversed by changing all NMOS transistors with PMOS transistors and viceversa in the CSA, and threshold settings should be changed accordingly. Figure 3 shows a simplified schematic of the CSA, which includes the parasitic capacitance C L and the input capacitance C I for clarity. The input stage is an active cascode [10, 11, 12] , a design widely used in the field of photodetector electronics, also referred to as super common base [13, 14, 15] . This design uses a local feedback through N 1 to lower the impedance at the source of N 2 , in order to read the input current pulses on a virtual ground node. The loop gain at intermediate frequency is g 1 R C , where g 1 is the transconductance of N 1 . The current pulses are integrated by the capacitor C F at the drain of N 2 , which discharges through the resistor R F . The output signal in response to a (negative) charge Q injected at t = 0 is given by
Design of the CSA
where τ R is the rise time constant given by the CSA bandwidth and τ F = C F R F is the fall time constant. The rise time constant τ R is of the order of 1 ns, and is directly proportional to C I as will be shown. The ASIC is designed for fast photodetectors, where the input current pulse is short, of the order of 1 ns. The fall time constant τ F was chosen to be 5 ns, large enough for an effective integration of fast pulses but small enough to sustain high rates without pile-up. In the simplified scheme of figure 3 , the main voltage (or series) noise source is N 1 together with the bias circuit I 1 , while the main current (or parallel) noise source is R F together with the bias circuit I 2 . Transistor N 2 contributes to the series noise, but its contribution is divided by the loop gain and becomes negligible. The optimal noise performance corresponds to the case where N 1 is biased with a large current I 1 to keep its transconductance high and its series noise low. Since R F contributes to the parallel noise, its value cannot be too small, and this poses an upper limit to the bias current I 2 of N 2 . With a low bias current, the transconductance g 2 of N 2 is low, and the input capacitance to ground C I due to the input bonding pad, the bonding wire, packaging, interconnects and to the sensor adds a pole to the input feedback loop at a frequency g 2 /2πC I . If R C and C C were not present, the load at the drain of N 1 would be purely capacitive, and there would be another pole at very low frequency due to C L . This would be the lower frequency pole of the feedback loop. At the frequency of the second pole, that is g 2 /2πC I , the feedback loop would become unstable, unless it were already lower than 1, in which case it would be ineffective in lowering the input impedance at this frequency. This case is illustrated in the bode plot of figure 4 , dashed line.
To compensate the pole due to C L , R C and C C are used. This case is illustrated in the solid line of figure 4. The effect of compensation is to limit the loop gain to g 1 R C at moderate frequency, higher than 1/2πC C R C . This shifts the pole due to C L at a higher frequency given by 1/2πC L R C . For this compensation to be effective, it is required that the value of R C is not too large and that C L is minimized with a proper layout. In particular, since the area of C C on silicon is larger than that of R C , its parasitic capacitance to the substrate is larger. A much lower value for C L is obtained is R C is placed before C C , as in figure 3 . The relatively low value for R C strengthens the need to keep high the transconductance of N 1 , while the transconductance of N 2 is less critical. As illustrated in the solid line of figure 4 , the dominant pole of the input feedback loop is now at g 2 /2πC I . This ensures that the feedback loop is effective in lowering the input inpedance up to a much higher frequency. The frequency where the loop gain becomes close to unity gives the bandwidth of the CSA. The associated time constant gives the rise time of the output signal:
The 10% to 90% rise time is given by 2.2τ R . The rise time is thus directly proportional to the input capacitance C I and inversely proportional to the loop gain g 1 R C . The stability of the feedback loop is ensured even if the sensor has a negligible capacitance, since the value of C I has a lower limit at a few pF due to the gate-drain capacitance of N 1 , that is less than 100 fF but its contribution is multiplied by the loop gain, its gate-source and gate-bulk capacitance (about 0.5 pF in total) and the stray capacitance of the pads, the bonding wires, eccetera. Considering all the contributions from the circuit the input capacitance can be estimated to be about 1.5 pF, bonding pads excluded. With the CLARO-CMOS mounted in a small QFN48 package the total capacitance at the input (without the sensor) was measured to be about 3.3 pF. The full schematic of the CSA is shown in figure 5 . To vary the gain, a set of MOS switches was included in the design. Two switches, N S3 and N S4 , are used to attenuate the input signal: if the digital control signals V 3 or V 4 are set high, the switches are closed and a part of the input charge passes through N 3 or N 4 and is wasted on the positive rail. The amount of attenuation B is set by choosing the dimensions of N 3 and N 4 , which are 3 and 6 times larger than N 2 respectively, causing attenuations of B = 4 and B = 7. An attenuation of a factor of B = 10 is obtained if both branches are enabled. The dummy switch N S2 whose gate is tied to the positive rail was introduced to preserve the simmetry between the input branches.
Another switch P SF controlled by the digital control signal V F is used to change the value of C F and R F , doubling C F and halving R F , to change the gain by a factor of 2 while keeping the discharge time constant the same. The voltages V 3 , V 4 and V F are the three control bits which allow gain setting on each channel. The reason why only one switch was used to change the values of C F and R F is related to the switch parasitics. If several switches were connected in series, their series resistance in the "on" state would have caused distortion in the shape of the output signal. If several of such switches were put in parallel, their capacitance in the "off" state would have been in parallel with C F , reducing the maximum gain achievable.
The dimensions of the bias transistors N B1 . . . N B5 were chosen so that the bias current of N 1 is about 2.5 times larger than that of N 2 . Transistor N 1 has a very large area to obtain a high transconductance g 1 . In this prototype the bias current of the CSA can be set by changing I A with an external resistor. Two operating modes were chosen: a "low power" mode, with I A = 2 µA, and a "timing" mode, with I A = 5 µA. In "low power" mode, N 1 is biased with 85 µA, resulting in g 1 = 2 mA/V. Since R C = 10 kΩ, the low frequency gain of the input feedback loop is about 20. The input branches with N 2 , N 3 , N 4 are biased with a total current 25 µA. The total transconductance of N 2 in parallel with N 3 and N 4 is about 350 µA/V, depending on which of N 3 and N 4 are enabled. If the feedback loop were not present, the input impedance would be higher than 2 kΩ. The feedback loop lowers this value to about 130 Ω. From equation 2 the 10% to 90% rise time is expected to be about 1.2 ns for C I = 3.3 pF, and 2.4 ns for C I = 6.5 pF. In "timing" mode, N 1 is biased with 170 µA, and its transconductance becomes 3.8 mA/V, so that the loop gain roughly doubles. The total transconductance of N 2 , N 3 and N 4 is about 500 µA/V. Thanks to the larger loop gain, the input inpedance is now reduced to less than 100 Ω. The bandwidth of the CSA is increased, and the loop gain at 1/2πC L R L becomes closer to unity, but stability of the feedback loop is still ensured even with a negligible sensor capacitance. The rise time of the signal at the output of the CSA as given by equation 2 is roughly half than in "low power" mode thanks to the larger loop gain. The main consequence is a reduction of the time walk of the discriminator, as will be shown in the following.
The noise of the CSA can be referred to the input as an equivalent noise charge (ENC). The detailed noise calculations are given in appendix A.1. For τ R ¡ 0.3 τ F , that is for C I ¡ 10 pF in "low power" mode, the ENC is given by
where i n is the current noise density, e n is the white voltage noise density and A f is the 1/f voltage noise coefficient. In addition to the noise from N 1 and R F , it is necessary to consider the noise contributions coming from the bias transistor N B2 , whose current noise directly contributes to the parallel noise at the input, and P B5 , whose current noise is divided by the transconductance of N 1 and becomes a series noise contribution at the input. Moreover, if the value of the filtering capacitors C B2 and C B5 is not large enough, additional noise coming from N B1 , N B3 and P B4 can be injected through N B2 and P B5 , contributing to the parallel and series noise respectively. In this first CLARO-CMOS prototype, filter capacitors C B2 and C B5 are not present. The parallel noise is dominated by the channel current of N B1 mirrored and multiplied by 10 by N B2 . Since in "low power" mode the transconductance of N B1 is g B1 = 35 µA/V we have
Other contributions come from N B2 , about 2 pA/ √ Hz, and from R F , about 0.9 pA/ √ Hz if V F = 1, 1.3 pA/ √ Hz if V F = 0, assuming B = 1. The weight of the noise generated by R F is directly proportional to the attenuation factor B:
at the maximum attuenuation, that is with B = 10, the noise from R F becomes the dominant parallel noise source with 9 pA/ √ Hz if V F = 1, 13 pA/ √ Hz if V F = 0. The other noise sources in the CSA do not depend on B, since they share the same attenuation as the signal. Anyway the attenuation is meant to be used only when the signals are large; so in those cases the signal to noise ratio is expected to be anyway adequate. In the following, for all noise evaluations, we will consider B = 1. The sum of all parallel noise is thus close to 7 pA/ √ Hz in "low power" mode with B = 1. In "timing" mode the parallel noise increases by about 20% due to the larger bias current which gives a larger transconductance to N B1 and N B2 .
The series noise is dominated by N 1 and P B5 . As already mentioned, additional noise from the other bias transistors is injected through P B5 since its gate is not filtered. In "low power" mode, where g 1 = 2 mA/V, the series white noise is dominated by N B1 , N B3 and P B4 , which all have a transconductance of about g B1 = 35 µA/V. The resulting white voltage noise at the input is
being 25 the area ratio between P B5 and P B4 . Other contributions come from N 1 , about 2.3 nV/ √ Hz, and from P B5 , about 1.6 nV/ √ Hz. The sum of all series white noise is about e n 14 nV/ √ Hz. In "timing" mode the series noise reduces by almost a factor of 2, because of the larger transconductance of N 1 which gives a larger loop gain. Compared to the series white noise, the contribution of the 1/f component is expected to be negligible since from simulations it is possible to estimate A f ¡ 10 −9 V 2 . According to equations 2 and 3, the parallel noise contribution to the ENC at the output of the CSA is expected to be about 1.8 ke − (0.29 fC) at C I = 3.3 pF, and 2.0 ke − (0.32 fC) at C I = 6.5 pF. The series noise contribution is expected to be about 7.5 ke − (1.2 fC) at C I = 3.3 pF, and 12 ke − (1.9 fC) at C I = 6.5 pF. The total noise of the CSA is thus expected to be 7.7 ke − (1.2 fC) at C I = 3.3 pF, and 12 ke − (1.9 fC) at C I = 6.5 pF. At the auxiliary output, the rise time is limited by the bandwidth of the analog buffer. In that case the weight of the series noise is expected to be smaller, while the weight of the parallel noise is expected to be larger, according to equation 3. For instance, assuming that the output buffer limits the output signals with time constants of τ R = 1.3 ns and τ F = 7.2 ns, equation 3 gives 5.6 ke − (0.89 fC) with an input capacitance of 3.3 pF, dominated by the series noise.
As already discussed, the filtering capacitors C B2 and C B5 can be used to improve the noise performance of the design, considerably reducing both the series and the parallel noise injected through the bias transistors, at the price of a larger layout area on silicon. This improvement will be considered for the next versions of the ASIC. Figure 6 shows the schematic of the comparator. The input stage is a differential pair loaded with a current mirror. This is the only part of the comparator which dissipates a continuous current. Since I C is about 1 µA, the differential pair is biased with about 100 µA. The signal from the CSA is connected to the inverting input of the comparator, while the noninverting input is held at a constant potential which defines the threshold. The threshold voltage at the inverting input of the discriminator can be set between 1.25 V (half the positive rail voltage) and 0.83 V (one third the positive rail voltage) in 32 steps, labelled from 0 to 31, thanks to a 5-bit DAC implemented as a simple voltage divider. Each step is about 13 mV. At the maximum gain, this corresponds to a threshold step of 150 ke − (24 fC). In ready state, the output of the differential pair is low, and stays close to 0.5 V. This signal feeds the inverter made of P 8 and N 8 . Transistor N 8 is small and has a large threshold, about 0.6 V. In this way N 8 is biased just below threshold: no current passes through the first inverter and its output is high. Transistor P H provides hysteresis, and since its gate is high it is switched off. The output of P 8 and N 8 is fed to the second inverter made of P 9 and N 9 , which is also the output stage.
Design of the comparator
In response to a negative pulse from the CSA, the output of the differential pair goes up, close to the positive rail. The output of first inverter goes to ground, closing the switch P H , which draws current from the differential pair and holds up its output providing hysteresis. At the same time, the output of P 9 and N 9 swings to the positive rail. The gate length of P H is large: its "on" resistance is about 150 kΩ, so that only a fraction of the bias current of the differential pair passes through P H , and after a few nanoseconds the output of the differential pair is able to get back to the initial condition. When the output of the differential pair goes down, the output of the first inverter goes up, transistor P H is opened and the output of the comparator goes down. After this the discriminator is ready to trigger another pulse from the CSA. The width of the output pulses is proportional to the amplitude of the input signals, allowing to apply time over threshold algorythms to determine the input charge and compensate for time walk.
The gain of the input stage of the comparator is about 30 V/V for small signals around threshold at low frequency, with a pole at about 30 MHz. The corresponding time constant is τ C 5 ns, about the same as the fall time of the CSA pulse τ F . The effect of hysteresis is to increase the gain to 600 V/V at low frequency. The gain of the inverters is about 20 V/V for each. The overall gain of the comparator at low frequency including hysteresis results in 24 × 10 4 V/V or 107 dB. Transistor P 9 is much larger than N 9 , in order to obtain a very fast transition on the rising edge at the output. The rise and fall times of the output signal depend on the load at the output of the discriminator. The output stage was designed to drive only a short line to a digital processing circuit or to an external low impedance driver, located a few cm away on the same board. Thus a purely capacitive load of a few pF is expected. This was done in order to give the maximum flexibility in the design of a full system and to avoid unnecessary power consumption in the CLARO-CMOS. The output signal is limited by the slew rate of the output stage on the output load, that is I L /C L , where I L is the current from the output stage, and C L is the output load capacitance. The output current can be estimated to be I L 2.5 V×g 9 , where g 9 is the transconductance of the output stage. For small signals, the transconductance of P 9 is 2 mA/V, and that of N 9 is 0.8 mA/V, even if this values are largely non linear since the output stage swings from rail to rail. Anyway the rise time is expected to be about two times smaller than the fall time, since the rising edge is driven by P 9 while the falling edge is driven by N 9 . With these numbers, the time required for the full swing from 0 V to 2.5 V at the output is about 2.5 V/(I L /C L ) C L /g 9 . With a load capacitance of C L = 8 pF, for instance, the output 0% to 100% rise time is 4 ns, which corresponds to a 10% to 90% rise time of 3.2 ns, and the output 100% to 0% fall time is 10 ns, which corresponds to a 90% to 10% fall time of 8 ns.
The input transistors N 6 and N 7 have a transconductance g C of about 700 µA/V, while P M 1 and P M 2 have a transconductance g M of about 300 µA/V. These are the main contributors to the noise of the comparator. Transistor N B7 does not contribute because its noise is common mode while the input stage is differential. So in the case of the comparator the bias filtering capacitor C B7 can be avoided. The input referred white voltage noise density can be expected to be e
which together with the 1/f contributions corresponds to a voltage noise at the input of about 65 µV RMS. Compared with the RMS noise at the output of the CSA, that is more than 1 mV RMS in the best case of a 3.3 pF input capacitance, this contribution is negligible, at least with the attenuation factor B = 1. With larger attenuations the weight of the noise of the comparator grows accordingly, and at B = 10 it becomes significant. Since as already mentioned the attenuation is only meant to be used with very large signals, where the signal to noise ratio is a minor concern, we will anyway consider the case of B = 1 in the following. The jitter on the rising edge of the comparator is expressed by
The calculations to obtain equation 7 are reported in appendix A.2. The time constant τ C 5 ns is given by the bandwidth of the first stage of the comparator. When the threshold is set at 300 ke − (48 fC), equation 7 predicts a jitter of 32 ps for 600 ke − (96 fC) signals, of which 24 ps are due to the series noise, and 18 ps to the parallel noise. As for the case of the ENC, the 1/f component is negligible. According to equation 7, jitter is expected to decrease to 8 ps for 1.5 Me − (240 fC) signals. For larger signals, equation 7 predicts an unlimited improvement; in reality the slope of the signal at the first stage of the discriminator is also limited by slew rate. So, in contrast with equation 7, jitter is expected at some point to stop decreasing for larger signals, and to saturate to a constant value.
3 Performance of the prototype Figure 7 shows the signal at the output of the CSA in "low power" mode, read out at the auxiliary output through the PMOS follower biased with a 1 kΩ resistor to the positive rail. The gain was set to the maximum value (V 3 = V 4 = 0, V F = 1), and pulses from 330 ke − (53 fC) to 3.3 Me − (530 fC) were injected at the input by a Agilent 81130A 600 MHz step generator through a 0.5 pF test capacitance. The 10% to 90% rise time of the test signals is 0.6 ns, simulating the typical charge collection time of a fast photomultiplier. The output of the PMOS follower was buffered with a Texas Instruments LMH6703 fast opamp driving a terminated 50 Ω line. The signals were acquired with a Agilent DCA-X 86100D 20 GHz sampling scope with the bandwidth limited to 12 GHz in our measurements.
The leading edge of the measured analog signal in response to a 330 ke − (53 fC) pulse is 2.8 ns (10% to 90%), its trailing edge is 15.8 ns (90% to 10%), the pulse width at 50% is 8 ns. The corresponding time constants are τ R = 1.3 ns and τ F = 7.2 ns. Due to the finite bandwidth of the PMOS follower, the measured signal is slower than the signal at the output of the CSA which feeds the input of the discriminator. Since the transconductance of the PMOS follower is less than 1 mA/V and its bias resistor is 1 kΩ, the amplitude of the buffered signal is smaller than at the output of the CSA. The input noise was obtained by measuring the baseline noise at the auxiliary output and referring it to the input of the CSA as an equivalent noise charge (ENC). The measured ENC for an input capacitance of 3. mentioned, once the correct rise and fall time measured at the output of the analog buffer are considered. The importance of low noise is mainly related with timing performance, which will be discussed in the following. Figure 8 shows the signal at the output of the discriminator when the CLARO-CMOS is operated in "low power" mode. The threshold was set at 6, which at the maximum gain corresponds to 800 ke − (128 fC), and signals from 810 ke − (130 fC) to 5.6 Me − (900 fC) were injected at the input. This range of input signals corresponds to the typical single photon response of a photomultiplier in nominal bias condition. As altready mentioned, the output stage of the discriminator is designed to drive a capacitive load of a few pF. In these tests the capacitive load at the output was measured to be 8 pF, contributed by the pads, the QFN48 package, and a short (a few cm) PCB trace to a Texas Instruments LMH6703 fast opamp used as a low impedance driver to the sampling scope. With this load, the 10% to 90% rise time is 2.2 ns, and the 90% to 10% fall time is 9.3 ns. The 50% pulse width depends on the amount of charge injected at the input, ranging from 7.2 ns for the shortest signal in figure  8 , that is just above threshold, to 21.7 ns for the largest signal in figure 8 , that is almost a factor of 10 above threshold. The delay between the input charge pulse and the time when the output of the discriminator reaches 50% is 5 ns for signals just above threshold, and lowers to about 2.5 ns for signals well above threshold. The delay is due to the rise time of the CSA pulse at the input of the comparator and to the difference in the speed of the comparator for different levels of overdrive. The difference between the two extreme values, about 2.5 ns in "low power" mode, constitutes the time walk of the discriminator, which is critical for timing performance, to be discussed in the following. This performance was obtained in "low power" mode, with an overall continuous power dissipation per channel of 0.7 mW. If the discriminator is triggered with a 10 MHz rate, the average power consumption increases to 1.9 mW per channel. It is worth noting that the signals in figures 7 and 8 are acquired at the output of the sampling scope: the displayed signals are obtained as the superposition of dots from several output signals, while the sampling trigger was synchronized with the step generator. In this way the figure incorporates at a glance also noise and jitter. The output signals shown demonstrate the capability of the CLARO-CMOS to count fast pulses from photomultipliers, from the single photoelectron up to larger gains, with a low noise, very high rate (up to 10 MHz), and a very low power consumption.
When the prototype is operated in "timing" mode, the power consumption is increased to 1.5 mW per channel (rising to 2.3 mW per channel with a 10 MHz rate). The difference in the output signals between "low power" and "timing" modes are small: the different power consumption affects only the output of the CSA, but the difference cannot be directly appreciated on the shape of the buffered signals because of the bandwidth limitation of the auxiliary output buffer. The differences between the two operating modes can be appreciated on the crosstalk and jitter measurements presented in the following. 
Crosstalk
With very fast circuits, such as the CLARO-CMOS, crosstalk may be critical. Fast signals could be capacitively coupled to neighbouring channels through parasitic capacitances much more easily than with slower circuits. The level of crosstalk between channels was measured as follows. The gain of the victim channel was set to the maximum value and its threshold was set at 300 ke − (48 fC). No signal was applied at the input of the victim, while large signals were injected at the input of a neighbouring channel. The crosstalk could be estimated from the amplitude of the minimum signal which triggers the discriminator of the victim. To simulate the real case where different pixels of a pixellated photodetector are connected to the inputs of the CLARO, a crosstalk capacitance C XT was added between the inputs as depicted in figure 9 . The input capacitance to ground in this measurement was C I = 6.5 pF.
The level of crosstalk was measured with different values of C XT both in "low power" and "timing" modes, and the results are plotted and linearly fitted in figure 10 . The crosstalk found on chip, that is with C XT = 0, is negligible. Signals up to 10 Me − (1.6 pC) where injected without triggering the victim. Increasing the value of C XT causes the crosstalk to increase correspondingly. The measured data were fitted with lines, whose intercept value is compatible with zero, confirming that no crosstalk is observed if no capacitance is added outside the ASIC between the inputs. The value of C XT in a given application depends on the type of sensor. For instance, the capacitance between the anodes of a Hamamatsu R7600 Ma-PMT is less than 0.5 pF. This would translate in a crosstalk level below 2% in "low power" mode, and below 1% in "timing" mode. A lower level of crosstalk is obtained in "timing" mode thanks to the lower input impedance, due to the larger loop gain in the CSA as already discussed. For fast readout of pixellated sensors it is mandatory that the parasitic capacitance between neighbouring inputs is kept under control. In the cases where the capacitance C XT cannot be reduced due to the characteristics of the sensor, a larger C I should be used. This would affect noise and bandwidth, but would Figure 10: Crosstalk versus crosstalk capacitance C XT in "low power" and "timing" modes.
help in eliminating crosstalk.
Timing resolution
To evaluate the timing performance of the CLARO-CMOS prototype the gain of the CSA was set to the maximum value, and the threshold of the discriminator was set at 300 ke − (48 fC). Since the timing performance is expected to be directly proportional to the signal to noise ratio, the use of small input signals corresponds to a conservative, worst case scenario. The time resolution of this setup was estimated to be 7 ps RMS by directly connecting the Agilent 81130A step generator to the Agilent DCA-X 86100D sampling scope. Some of the measurements presented in the following reach 10 ps: in these cases the result is partially limited by the setup. The setup contribution of 7 ps was subtracted in quadrature from the measurements. Moreover, as already mentioned, the 10% to 90% rise time of the input test signals is 0.6 ns, which is not negligible compared to the rise time predicted at the output of the CSA by equation 2 in "timing" mode and with a low input capacitance. As expressed by equation 7, the timing resolution on the rising edge of the discriminator signal is limited by the time contant of the first stage of the comparator τ C about 5 ns. Thus the contribution of 0.6 ns due to the test signal generator is expected to be negligible in the jitter measurements. It may anyway have some impact on the effectiveness in time over threshold compensation presented in the following.
The overall timing performance of a system composed of a sensor and a low jitter readout circuit depends also on the precision of time walk compensation; otherwise the low jitter would be spoiled by the time walk induced by the amplitude spread of the signals coming from the sensor. Figure 11 shows the dependence of the delay on the pulse width, starting from signals just above threshold. The difference in the delay for a given range of input charge is the time walk of the discriminator. This is the fundamental curve on which the time walk compensation based on time over threshold measurement is based. The slope of the fitting lines can be used to estimate the time over threshold effectiveness in compensating time walk. To a first order approximation, the curves of figure 11 do not depend on threshold. The measurements were taken both in "low power" mode and "timing" mode. In "low power" mode, as already mentioned, the delay ranges from about 5 ns to 2.5 ns, thus the time walk for this range of input signals, that is the difference between the two, is 2.5 ns. In "timing" mode, as shown in figure 11 , the time walk of the discriminator reduces by about a factor of 2. Thus, even if the shape of the output signals and the maximum sustainable rate are the same as in "low power" mode, the effectiveness of a time over threshold measurement in compensating time walk is improved by a factor of 2.
The measured RMS jitter versus input charge is displayed in figure 12 for the "low power" mode. The plot shows the jitter on the rising edge, that is 113 ps on threshold (about 300 ke − , or 48 fC), decreasing to 34 ps for signals of 560 ke − or 90 fC and then reaching 9 ps for large signals (4.5 Me − , or 720 fC). The measured values are in a good match with the values predicted by equation 7. For larger pulses, the rising edge jitter stops decreasing and saturates to a constant value.
The jitter on the falling edge is larger because the transition is slower. Moreover, the jitter on the falling edge is affected by a small disturbance which occurs on ground when the discriminator triggers. This explains the non-monotonic behaviour of the falling edge jitter shown in figure 12 . Anyway, the falling edge is only used to compensate time walk: thus the weight of the falling edge jit- ter on a timing measurement is given by relation between time walk and pulse width, that is the slope γ of the lines used to fit the data in figure 11 . In other words, the jitter on the falling edge is normalized according to
where γ is 0.113 in "low power" mode and 0.055 in "timing" mode, as shown in the legend of figure 11 . The jitter on the falling edge normalized with this weight is shown in the plot, and is about 100 ps just above threshold, decreasing to 13 ps with large signals. The overall timing performance (including time walk compensation) is given by the quadratic sum of the rising edge jitter and the normalized falling edge jitter, and is shown in the red curve of figure 12 , going from 135 ps just above threshold to 50 ps at 780 ke − (125 fC), furtherly decreasing to 17 ps with 4.5 Me − (720 fC) signals. The same measurements are given in figure 13 for the "timing" mode. The RMS jitter on the rising edge goes from 92 ps just above threshold (300 ke − , or 48 fC) to 10 ps with large signals (4.5 Me − , or 720 fC). Now the rise time τ R of the CSA pulse is smaller than in "low power" mode, so the jitter on the rising edge is a bit smaller than in "low power" mode, but since the speed is in any case limited by the first stage of the discriminator the values are still in agreement with the values predicted by equation 7. Since now the time walk compensation is twice as effective than before, the normalized jitter on the falling edge goes from 44 ps to 6 ps, becoming almost negligible. The overall timing resolution is thus 102 ps just above threshold, quickly decreasing below 50 ps above 380 ke 
Conclusions
The first protototype of the CLARO-CMOS was deeply characterized with a particular emphasys on its timing resolution, also considering the effectiveness of time walk compensation through time over threshold measurement. The prototype performes as expected, proving the adequacy of the design approach described. The obtained time resolution down to 10 ps RMS for input charge pulses corresponding to single photoelectron signals from a typical photomultiplier is outstanding, considering the very low power dissipation of the prototype, below 1 mW per channel.
where s = iω is the complex frequency, τ F = C F R F , and τ R = C I /g 2 g 1 R C as given by equation 2. The response to a delta-like pulse Qδ(t) is obtained by multiplying equation 9 by Q and taking the inverse laplace transform, which gives equation 1. A white current noise density i n at the input is converted to a voltage noise at the output which is given by
A voltage white noise density e n at the input can be converted to its Norton equavalent, that is a current noise density sC I e n . The corresponding voltage noise density at the output is
and the same happens for a voltage low frequency noise A f /f , which gives
To obtain the squared RMS noise at the output, one must integrate the squared amplitudes of equations 10, 11 and 12 in dω/2π over the whole frequency spectrum. Equation 10 gives
equation 11 gives
and equation 12 gives
If one lets τ R → τ F the above expressions reduce to the known expression for a RC-CR filter [16] . In our case τ R ≤ 0.3τ F , so we can approximate expanding to the first order in τ R /τ F . Equation 13 becomes
and equation 15 becomes
Summing together equations 16, 17 and 18 one obtains the total squared RMS noise at the output:
The square root of equation 19 gives the total RMS noise at the output of the CSA. To obtain the noise referred to the input as ENC one must calculate
where V O MAX (Q) is the peak output voltage for a charge Q, which can obtained from equation 1 and is
which expanding for small τ R /τ F becomes
where the expression was approximated using the fact that x x 1 + x ln x for small x, and all the terms in τ R /τ F with power equal or higher than 2 were dropped. Equation 22 can be furtherly approximated by 
which to the first order in τ R /τ F can be also written as
By taking the square root of equation 25 we obtain equation 3.
A.2 Jitter
To calculate the impact of the noise of the CSA on the timing resolution of the discriminator, one must consider the overall transfer function of the CSA and of the first stage of the comparator when it is triggering, that is when the voltage at the two inputs is almost equal. Neglecting hysteresis the transfer function of the first stage of the comparator in the complex frequency domain can be modelled as G/(1 + sτ C ). By combining this with equation 9 we obtain the transfer function of the whole chain from the CSA input to the discriminator output, which gives
where equation 9 was approximated for τ R 0, since bandwidth is now limited by τ C , which is expected to be larger than τ R at least for small values of the input capacitance C I . As in the case of the squared RMS noise at the output of the CSA alone, which was given by equations 13, 14 and 15, we can calculate the squared RMS noise at the output of the first stage of the discriminator. For a current noise source i n we obtain
for the voltage white noise
and for the voltage low frequency noise
where the expressions were approximated for τ C τ F . The sum of these gives the total RMS noise at the output of the first stage of the discriminator. To obtain the corresponding timing resolution, one must divide the voltage noise by the slope of the signals at the output:
where t T H is the time when the second stage of the discriminator triggers the signal. By multiplying equation 26 by the input charge in excess of threshold, that is Q − Q T H , then computing its inverse Laplace transform and differentiating it with respect to time, one obtains
which, considering that τ F τ C , becomes
Assuming that the second stage of the discriminator triggers for t τ C equation 32 gives 
