Abstract-We describe a high-rate energy-resolving photoncounting ASIC aimed for spectral computed tomography. The chip has 160 channels and 8 energy bins per channel. It demonstrates a noise level of ENC electrons at 5 pF input load at a power consumption of mW/channel. Maximum count rate is 17 Mcps at a peak time of 40 ns, made possible through a new filter reset scheme, and maximum read-out frame rate is 37 kframe/s.
Abstract-We describe a high-rate energy-resolving photoncounting ASIC aimed for spectral computed tomography. The chip has 160 channels and 8 energy bins per channel. It demonstrates a noise level of ENC electrons at 5 pF input load at a power consumption of mW/channel. Maximum count rate is 17 Mcps at a peak time of 40 ns, made possible through a new filter reset scheme, and maximum read-out frame rate is 37 kframe/s. Index Terms-CMOS, photon counting, spectral computed tomography, x-ray detection.
I. INTRODUCTION
P HOTON counting energy resolved direct conversion detectors for computed tomography (CT) allow exploitation of the spectral information of each incident photon. Since linear attenuation coefficients of tissue exhibit large energy dependence in the energy range used in CT (30-120 keV), signal-tonoise of x-ray quanta will depend on photon energy. Optimum energy weighting schemes were derived in [1] and it was later shown that the information content of an interacting photon is to a good approximation proportional to energy [2] . This is in sharp contrast to commonplace energy integrating x-ray detectors where the signal generated is proportional to energy. Indeed, by applying optimal weighting schemes to CT, increased contrast-to-noise ratios by 15%-60% at a given dose will result [3] - [5] . Knowledge of the x-ray spectrum before and after the object being imaged also allows for decomposing the estimated linear attenuation into known energy basis functions. One of these is typically selected to capture the energy dependence of the photoelectric effect, one the energy dependence of Compton scattering, and the third the energy dependence of a contrast agent exhibiting a sharp k-edge in the applied energy interval [6] . Such schemes have successfully been applied to photon counting energy sensitive detectors [7] .
A key component in such a system is a photon-counting ASIC with energy classification. Particularly critical is the maximum count rate, the minimum detectable energy (noise level), and the power consumption (as tomography requires very many channels).
Several photon-counting ASICs have been reported in recent years, but only a few are addressing the count rates needed for computed tomography. Pangaud et al. [8] - [10] presented a read-out ASIC with 9600 channels and a single energy bin, which can manage 1 Mcps and a read-out rate of 500 frames/s, and exhibits a noise level of about 150 electrons RMS at a power consumption of 40 per channel. Moraes et al. [11] presented a 128-channel ASIC with 2 energy bins which can manage 5 Mcps. It has a noise level of about 700 electrons RMS for 5 pF detector capacitance and a power consumption of 2.1 mW per channel. Steadman, et al., finally, presented a 64-channel high rate energy-resolving ASIC in [12] . They have a peaking time of about 12 ns, allowing a counting rate of up to 13 Mcps. The noise level is about 340 electrons RMS, and they claim "several" energy bins. They later reported x-ray detection by combining the ASIC with a CdTe detector [13] .
The ASIC described here is aimed for detecting holes from silicon strip detectors in a standard computed tomography context. It does address the need for very high count rates and it implements 8 energy bins (8 counters) per channel. A typical maximum x-ray photon energy in computed tomography is 120 keV, so the maximum number of holes to be detected is about 40 000 (corresponding to 144 keV with a silicon detector). A computed tomography system utilizing silicon strip detectors was analyzed in [14] and [15] . In this proposed architecture, one detector module has 50 strips with edge-on geometry, each divided into 16 sectors (in order to limit the maximum rate at each read-out channel), with a total silicon area of 30 20 . From this analysis, it is possible to extract requirements on the read-out ASIC. Because of the relatively long strip detectors (up to 4.8 mm long sectors), and a complicated wiring between detectors and ASIC, we expect a detector capacitance of up to 3 pF. The maximum expected count rate per channel is estimated to 10 Mcounts/s. However, in the same time the integration time (peak time) cannot be too short because of a nonzero duration of the detector current from a single event. In [15] an optimal peak time of about 40 ns was estimated for the actual detector geometry used. In order to combine a high rate with a relatively large peak time, we have implemented a new reset mechanism for the shaping filter in this design. For the energy classification, we decided to implement 8 energy bins. The energy resolution, as well as the minimum 0018-9499/$26.00 © 2011 IEEE detectable energy, is controlled by the thermal noise level of the read-out ASIC. In [15] a thermal noise level of 300 electrons RMS was used, and found sufficient. The final issue is the power consumption. Each detector module thus requires 800 read-out channels. In order to keep the power consumption within manageable limits, we need to limit the total power per module to 4 W, which means 5 mW per channel (note that we expect to have about 1500 detector modules in a full system). Furthermore, we need to limit the total area used by the ASICs to about 30 10 , in order to house the ASICs on the detector die. We decided to use 5 ASICs per detector, each with an area of about 5 6 and stud-bonded to the detector. Each ASIC has 160 read-out channels. Finally, because of the relatively large power consumption of a full system, we expect that the ASICs will work under changing temperature. We therefore require that the ASIC must work correctly under temperature variation in the range 30-70 . This paper is organized as follows. In Section II we describe some preliminary considerations, in Section III we describe the analog channel and in Section IV the digital block. We then describe the chip implementation in Section V and the measurements of our fabricated chip in Section VI. The paper is concluded by a discussion in Section VII and a conclusion in Section VIII.
II. PRELIMINARY CONSIDERATIONS
As discussed in the introduction, we aim at a read-out ASIC with 160 input channels, 8 energy bins, a maximum rate of more than 10 Mcounts/s per channel, maximum noise level of 300 electrons RMS, and a maximum power consumption of 800 mW (5 mW per channel). As this ASIC is aimed for a 16 sector silicon strip detector with a pixel size of 500 , the maximum allowable count rate is . In addition we estimate a frame time on the order of 15-30 and a need for simultaneous data collection and data transfer. Critical overall issues are to manage the channel noise level and speed for the analog part, and data management and data I/O for the digital part, within a strict power budget.
We have chosen to implement the chip in a 180 nm standard CMOS process, utilizing a somewhat reduced supply voltage of 1.5 V to save power. In order to save input pins, we have chosen single-ended analog inputs instead of differential. In Fig. 1 , we show an overall block diagram of the chip.
The chip is divided into two large blocks, one with 160 analog channels, and one digital with counters, readout circuitry, and some control logic. In addition, there are 8 global programmable thresholds, including DACs, an interface including a set of registers controlling various analog settings and functions which can be written via the LDVS interface, and two LVDS I/O blocks.
Regarding the analog part, we will allocate 3 mW of the power budget to each analog channel. We expect a total input capacitance of about 5 pF, which includes detector capacitance, wiring between detectors and readout ASIC, pads, protection circuits, and input transistor gate capacitance. We further require a noise level of 300 electrons RMS down to a peaking time of 10 ns. The realism of this power budget can be estimated as follows [16] . Using a traditional charge sensitive amplifier (CSA) as input stage, its noise contribution can be expressed as [17] ENC (1) where ENC is the noise expressed in equivalent number of electrons, A is a constant ( for a 2nd order shaping filter, where we corrected the expression in [17] by an increased thermal noise factor for submicron transistors, see [16] ), is the total input capacitance, is the peaking time, and is the input transistor transconductance. Assuming that an ENC of 200 out of 300 is allocated to the input stage and , we need a of about 20 mA/V to meet the noise requirement. From this, we may estimate the current consumption of the input stage to about 1.5 mA ( with , see [18] ), which with a supply voltage of 1.5 V gives a power consumption of 2.25 mW. As we expect the input CSA to dominate the power consumption of the analog chain, we judge that a power budget of 3 mW is realistic. The analog channel further needs an amplifier, a shaper filter, and 8 comparators for the 8 energy thresholds. The shaper filter is of second order and is implemented as two filters. In order to speed up the filter recovery time, we have introduced a filter reset function, further described in Section III-D. Finally, we need to have an offset calibration scheme to remove various offsets in the analog chain. This is particularly needed because of the single-ended inputs. The analog part is modularized by combining 20 channels into one module. By sharing bias circuits within each module, further power saving is achieved.
The digital block is designed in Verilog and then synthesized into logic and layout (except the LVDS blocks which were custom designed). Assuming that half of the counts may fall within one energy bin, we can expect a maximum of per channel per energy bin. We therefore use 8b counters for each channel and energy bin. We thus need to read out in 15 , or 682 Mb/s. This can be accomplished through 2 serial data links, each running at double data rate, i.e., transmitting data on both clock edges, thus requiring a clock frequency of 170 MHz. At 30 frame time, 85 MHz is sufficient. We have chosen to design the chip for a target clock frequency of 200 MHz, with an option to operate it at 100 MHz. In addition to the readout function, the digital part will also perform ASIC control, which will be controlled via a serial input port. The most power demanding function of the digital part is the 1280 8b counters running at 100-200 MHz clock frequency. In order to minimize power consumption and area of these counters, they were implemented as linear feedback shift registers (LFSR) instead of as full binary counters. To further minimize digital power, we utilized low voltage swing LVDS digital I/O [19] as well as clock gating techniques.
III. ANALOG CHANNEL

A. Analog Front-End
We have chosen to follow the established principle of a charge sensitive amplifier followed by a semi-Gaussian shaper [17] . To eliminate undershoot due to the decay time of the CSA feedback, we have incorporated a pole zero cancellation (PZC) circuit [20] . In order to keep flexibility for possible future developments, we have chosen to implement three peak time values, 10 ns, 20 ns, and 40 ns.The complete front-end is shown in Fig. 2 .
As mentioned above, the most critical circuit from a noise perspective is the charge sensitive amplifier (CSA). For the CSA, we have chosen single-ended folded cascode solution with a PMOS input transistor, see Fig. 3 . By choosing a PMOS input transistor, we expect that thermal noise will dominate over 1/f noise [16] . The CSA was dimensioned according to the estimates above, that is with a of the input stage of about 20 mA/V at a supply current of about 1. . This is sufficient to drive a feedback resistor of 3 , so we need no buffer stage. Simulation indicates that the PMOS fulfills our assumption of low 1/f noise compared to thermal noise. Simulations further show that the feedback resistor will have small effect on the thermal noise if larger than about 3 [16] . The feedback capacitor value is . 
TABLE I PARAMETER VALUES
The feedback resistor is implemented with a PMOS transistor, following a scheme similar to [21] , with a value programmable between 0.8 and 3.5 . The second amplifier uses the same topology as the CSA and achieves an open loop gain of 64 dB at 120 supply current. It performs three of these functions: it contributes voltage gain, it performs cancellation of the pole formed by and , and it contributes one pole to the shaping function differentiation. The voltage gain is controlled by , the pole is cancelled if
, and the differentiator time constant is . We have chosen a second order semi-Gaussian shaping filter with a transfer function of [17] (2) For a second order filter, the peak time is given by , and the time response of the filter is given by (3) where is the peak amplitude, occurring for . Here, the first pole differentiator time constant is controlled by and the two other poles are implemented by two Gm-C filters, see Fig. 4 . The Gm elements are implemented as single stage differential amplifiers with linearity improvement as in [22] . No special actions for the matching between the differentiator time constant and the poles were taken, as simulations indicate this to be uncritical.
In order to save power, we sought the least supply current for the second amplifier and the Gm elements not causing too much thermal noise. This was done by reducing the supply currents until the simulated total noise started to increase. This resulted in a supply current of 120 for the amplifier and 100 for all four Gm cells. In addition, by sharing bias circuits between 20 front-ends, the bias circuit overhead is limited to 7%. As a result, the full analog front-end consumes 2.9 mW including shared bias currents, of which 86% is allocated to the CSA.
As mentioned above, we implemented three values of peak time. This was accomplished by programming several elements as shown in Table I . The reason to change both and is that we need to keep the gain between different peak times in order to keep the same pulse amplitude for a given input charge.
is realized in the same way as , and follows the value of when it is changed. The different values of , , , and are implemented by switching in one or several similar elements. The basic Gm stage consists of 2 identical elements in parallel. By powering down one of these stages, we divide the Gm value by half. The total low frequency gain of the second amplifier and the filters is 32, given by from the second amplifier and 2 from each of the filters (2 from the fact that is connected to both Gm elements for each filter, see Fig. 4 ). All programming is controlled by the serial control interface (see Section IV).
B. Comparators and Pulse Amplitude Detection Procedure
In order to manage both speed and power constraints, we decided to utilize clocked comparators connected directly to the filter outputs without analog peak detectors. The comparators are clocked by the main clock (at 100 or 200 MHz) and will be active only at negative clock edges. This leads to a small error in the detected amplitude, as we do not hit the exact pulse peak for each pulse. Each channel has 8 comparators, connected to 8 different DAC-levels, which voltage levels are controlled by the serial control interface. The actual procedure for detecting pulses and pulse amplitudes are as follows. See Fig. 5 .
When the input signal to any comparator exceeds the respective threshold, and we have negative clock edge, it will turn on. A digital register will be set for each comparator which has turned on. We thus have a digital peak detector. After the turn on of comparator0, corresponding to the lowest threshold, we wait clock cycles (programmed to occur after the pulse peak; in Fig. 5 ), then we compare all register outputs and increment the counter corresponding to the register with the highest number (corresponding to the highest threshold which was passed by the pulse height). After this, the registers are reset and we wait another clock cycles before accepting a new input pulse. This procedure can be extended by introducing a filter reset function, see Section III-D below. In this way we always increment the counter corresponding to the actual pulse height. and depends on the peak time used; should make the comparators read after the peak is passed and should make sure that the pulse has decayed below the lowest threshold before detecting a new event.
and are controlled by the digital block and programmable through the serial control interface.
The comparators themselves are implemented as two differential preamplifiers followed by a sense amplifier latch, two inverter buffers, and an SR latch, see Fig. 6 . The preamplifiers are offset calibrated and prevent the inherent offset of the latch to dominate the total offset. The sense amplifier latch uses PMOS inputs and is active for the negative clock edge. The SR latch guarantees stable output levels.
C. Offset Calibration
We have here a single-ended analog front-end with high gain. It is therefore sensitive to offset variations. The main sources of variations are manufacturing variations and temperature variations. As we expect temperature variations to occur during one measurement, we have chosen to perform offset calibration semi-continuously. As we expect to use relatively short frame times , we may store the offset error on a capacitor. We thus perform offset calibration for each filter before each frame recording by forcing the filter outputs to the reference voltage during 150 ns and storing the offset calibration voltage on a capacitor using the circuit in Fig. 7 .
All offset errors occurring before the first filter will be handled by the calibration of that filter. Offset errors occurring in the comparator is partly calibrated as described in Section III-B. With this scheme, offset calibration is performed during measurement, that is with input pulses to the front-end. In order to limit the effect of the input pulses, we remove in Fig. 2  during offset . This is equivalent to introducing a low-pass filter in the analog path during offset calibration. In addition to removing the transient signal, the low pass filter also reduces the With the filter reset mechanism, the filter output is shorted a certain time after pulse detection (at the "ticks") resulting in the solid curve. Note that the first two solid peaks are similar in amplitudes and that we are able to detect also a smaller pulse after a strong one (third peak). noise level during offset calibration. Simulations indicate that offset calibration increases the total noise level by about 10%. This is then similar to a well-designed classical baseline restoration circuit [23] , [24] . One drawback with this solution is that we calibrate towards the time-average signal rather than towards "zero", which leads to rate-dependent offset at large rates.
D. Filter Reset Function
Two problems occur at high count rates in a traditional design with a charge sensitive amplifier followed by a shaper. When several photons arrive close in time, the resulting shaped pulses overlap and add, see Fig. 8 , dashed curve. This has two consequences: first photons close in time are not distinguished but may be counted as one, leading to reduced counting rates [25] and second, a second pulse of two close pulses will appear at false amplitude. This is particularly important for energy resolving detectors, as it will lead to distortion of the observed energy distribution. We have therefore introduced a filter reset mechanism to mitigate these effects.
The principle of this mechanism is demonstrated in Fig. 8 . A pulse from the shaping filter is detected at a first threshold and its amplitude is measured after a fixed delay, as described in Section III-C. When the pulse peak has passed, the filters are reset, making their output zero (or rather equal to the reference voltage) during a time . The reset is accomplished by simply shorting the capacitors (Fig. 4) to the reference voltage . When the filter reset is released, there is no reminiscence of the first photon, so any new photon can be detected with the correct amplitude. The filter reset function does not add any noise to the signal. During reset, the filters attenuate noise, which dominates over the increased noise bandwidth. When switches are lifted, there is no signal memory which might add to the noise level.
The reset function will not only allow the detection of a new photon relatively close to the first one (compared to the peak time), but it also allows the detection of a small amplitude after a large one. The observed rate versus the true rate can be calculated according to a non-paralyzable model in [25] : (4) where is the observed rate, is the true rate, and is the dead time, in our case , where is the minimum number of clock pulses between pulse detection and filter reset release and is the clock period. For in the present design. For the conventional case, is given by the length of a pulse at its "base", that is the pulse length at the threshold value. Just assume an average pulse height corresponding to 60 keV and a threshold corresponding to 5 keV and assume the pulse shape of (3) gives . In our case, we demonstrated at which is then an improvement of about compared to the conventional case. In addition, there will be no distortion of the energy distribution.
E. Reference Voltage and DACs
A global reference voltage,
, of nominally 750 mV is generated by a simple resistive divider between analog supply and analog ground. This reference voltage is used both for DACs and as reference when resetting the analog channels. 8 DACs are implemented by mixed 2R-R resistor ladder and coded current generator techniques. Each DAC is controlled from one of the registers in the serial control interface. The DAC output voltage is nominally , where is adjustable between 0 and 300 mV in 256 steps. It is possible to adjust the DAC reference to -, with , which permits the evaluation of noise and offset around zero ( to ) at zero input.
IV. DIGITAL BLOCK
As shown in Fig. 1 , the digital part section of the ASIC consists of two parts: the serial control interface and the digital count/readout part. The serial control interface contains various configuration registers that control the operation of the ASIC and which can be written via an LVDS interface. The digital count/readout part includes an interface to the analog part and the counters needed to count the events detected by each analog channel. It further transmits all recorded data via two LVDS ports, each transmitting at double data rate. The bit rate for the LVDS output is thus four bits per clock cycle, which allows for 400 Mb/s (800 Mbit/s) at 100 MHz (200 MHz) clock frequency to be transferred out of the ASIC. In addition the digital part includes various features useful for testing of both the analog and the digital part.
Due to the tight power budget, the power consumption of the digital part was an important optimization goal during the design process. To reduce power, clock gating is heavily used in the digital part of the ASIC. For example, all 1280 counters are clock gated on an individual basis. Similarly, all 1280 intermediate registers seen in Fig. 9 are clock gated on a per-channel basis.
A. Serial Control Interface
The serial control interface accepts 16b commands via the LVDS serial input port. One command can be used to write an 8b word into one selected register out of 64 registers. In addition to these write commands, there is one NOP (no operation) command used to introduce a known time delay of 20 clock cycles, and one READOUT command used to start the readout process. The registers are used to control all analog circuit parameters (choice of capacitor values, resistor values, current values, etc.), to set all DAC values, to control various other parameters (delays, modes of operation, etc.), and to control test modes. The READOUT command moves the counter values to intermediate registers, resets all counters, initiates a new frame recording, and initiates readout of all data in the intermediate registers.
B. Channel Counters and Read-Out Function
There are a total of 160 event counter modules in the ASIC, one for each analog channel. The overall architecture of such a module is shown in Fig. 9 . One eight bit counter is associated with each input level from the analog part. The priority decoder is responsible for making sure that only one counter is actually incremented for each event that occurs. For example, for the event shown in Fig. 5 , only counter 3 would be increased.
The counters are implemented as so-called linear feedback shift registers (LFSR) [26] which have several important advantages over a normal binary counter. The difference between a normal binary counter and an LFSR based counter is shown in Fig. 10 .
The first advantage of an LFSR based counter is that the area is significantly smaller than a binary counter. For the 8-bit counters shown in Fig. 10 , the area of the binary based counter is about 30% larger than the LFSR based counters when synthesized to the 180 nm process used in the ASIC. Since the ASIC contains 1280 such counters, optimizing this area is obviously worthwhile. In addition, the critical path (the path with the largest logical delay) of an LFSR based counter is very short, consisting of only two XOR gates. This can be compared to the critical path of the ripple-carry based binary counter shown in Fig. 10 , where the critical part consists of 8 half-adders. (There are ways to decrease the critical path in a binary counter, but this comes at a cost of additional area usage.) The most important drawback of an LFSR based counter is the need to convert an LFSR coded value into a binary value before any arithmetic can be done. Therefore, it does not make sense to use LFSR coded values unless the area cost of the converter can be amortized over many counters. (For 8-bit counters, the break-even point seems to be at roughly 40 LFSR counters when synthesized to the 180 nm process used here.) However, for simplicity reasons, in the current version of the ASIC, no such converter is included. Instead the conversion is performed off-chip, by a supporting FPGA circuit.
Another drawback of LFSR based counters is that an N-bit counter can only reach states. This is not an issue in the present ASIC as it is expected that a single counter in the ASIC will not need to count to more than about 150 anyway (as described in Section II).
In addition to the counters, each channel also contains a status register with two bits. One bit will be set if a counter in the channel has overflowed. The other bit will be set if the inputs from the comparators analog part does not conform to the expected format which is ones followed by zeroes, (see also Fig. 5 where it is clear that, for example, comp1 should not be set unless comp0 is also set).
To allow for continuous event acquisition, there is a possibility to sample the values of all counters and the status register and store those into intermediate registers (shown to the right in Fig. 9 ). This means that readout of all counters and status registers can be performed without stopping event acquisition.
At a READOUT command, the following procedure is followed: First the current values of the LFSR counters are moved to intermediate registers (see Fig. 9 ). Then all LFSR counters are reset, and the readout of all intermediate registers is initialized. Next, offset calibration of the analog front-end and the comparators is performed. When offset calibration ends, the analog part starts the acquisition of a new frame. During the readout process, all intermediate register values will appear at the two LVDS outputs following a simple serial protocol. For each channel, 68 bits are transmitted ( counter bits, 2 status bits, and 2 parity bits). The total number of bits to be transmitted is therefore . This needs 27.2 at 100 MHz (200 MHz) clock frequency. The ASIC thus supports frame rates of 37 or 74 kframe/s.
C. Test Functions
There are several test functions implemented to ease test of the chip. These functions facilitate isolation of errors between the analog and the digital block, as well as continuous debugging of a single analog channel.
It is possible to direct four of the comparator outputs from a selected channel directly to the LVDS outputs. It is also possible to direct the output of the priority decoder shown in Fig. 9 directly to the LVDS outputs. To test the digital part of the design, it is also possible to disable the inputs from the analog part and connect these inputs to a test circuit. This test circuit allows any value to be sent to any channel by sending the appropriate commands over the serial control interface.
In addition, the test circuit can also be connected to a built-in self-test (BIST) module which will send a known pseudo-random sequence to all digital inputs. This is used to quickly find most manufacturing faults in the digital part of the ASIC. First, all counters are reset. Second, BIST mode is enabled for a fixed amount of clock cycles. Third, all counters are read out and compared to the expected values. The BIST mode can also be used in a production environment to ascertain that the digital part has not malfunctioned. Finally, there is also a loopback mode which is automatically enabled by the reset signal. When the ASIC is in this mode, all commands sent to the serial control interface will be automatically routed to the LVDS outputs. This ensures that the supporting circuits connected to the ASIC can calibrate any delay values required to operate the (relatively) high speed LVDS interface.
D. LVDS Interfaces
The LVDS interfaces are custom designed to fit communication with a standard FPGA LVDS interface. It is not possible to completely fulfill the LVDS specification regarding common mode level with a minimum supply voltage of 1.45 V, but we take advantage of the fact that the FPGA side of the interface is designed according to the standard and is therefore very tolerant to offsets in common mode voltage. In our implementation, the output common mode voltage is adjustable from 0.8 to 1.2 V and the interface has proven to give very reliable communication up to the specified 200 MHz clock frequency. The LVDS interface minimizes the risk of noise feedback to the sensitive analog inputs because of low signal swing differential signaling. It has a separate power supply in order to avoid noise coupling to and from the LVDS interface.
V. CHIP IMPLEMENTATION
The chip was implemented in a standard 180 nm CMOS process. In order to save power, the supply voltage was chosen to 1.5 V instead of 1.8 V. The analog channels, each with a footprint of , were grouped into groups of 20, which were implemented together, sharing bias circuits. Four such groups were implemented in the lower part of the chip and four in the top part. Common analog circuitry was implemented between these blocks of four (see Fig. 11 ). The common analog block includes eight DACs and a common reference voltage generator, all distributed to all channels.
The digital block was synthesized from Verilog into a given frame. Special care was taken to carefully specify a robust interface between the analog and the digital block, as it was not feasible to co-simulate the whole chip after layout. Instead we relied on large timing margins of the interfaces.
As the total power consumption is relatively large (800 mW) and as we run the digital part simultaneously with the very sensitive analog part, we have put a large effort in designing and verifying the supply networks. The analog and digital supplies are separated, with separate on-chip decoupling. Also pad-rings (supporting protection circuits) are separated from the cores, and the LVDS supply is separate. The analog and the digital blocks are clearly separated in layout, and there is no metal connection between analog and digital ground (only resistive connection via substrate). Careful simulations of the substrate system, including models of external decoupling, indicate a maximum ripple level of 900 peak-to-peak at the filter output caused by the digital block.
All analog inputs (single ended channel inputs) are localized along one edge of the chip, in order to keep them apart from any other wiring (supplies, digitals, etc.). In order to accommodate 160 pads for stud-bonding, the input pads are organized in 5 columns with 32 pads in each, with a pitch of 150 and pad size of 75 . Top and bottom edges of the chip contain pads of supplies only, with analog closest to the input pads and digital closest to the opposite edge. One reason to locate analog close to the inputs is that analog constitutes the AC reference for the inputs. Finally all digital I/O, other control signals, test outputs, and analog reference bias are located at the edge opposite to the input pads.
The total number of transistors was estimated to 1.6 M. The final layout had an area of 5 mm 6.6 mm, and the total power consumption was estimated to 800 mW at 200 MHz clock frequency, or 5 mW per channel, of which 4.1 mW is for the analog block. 
VI. MEASUREMENTS
Experimental verification of this chip is not trivial, as we have a large number of single ended very sensitive inputs and at the same time a quite large digital block running at 100-200 MHz clock frequency. We therefore used carefully designed test boards, on which the ASIC is bare bonded. Two test boards were used, one with 4 test signal inputs, and most inputs open, and one with one test signal input and 64 inputs connected to 64 test strip diodes on a single chip, as in Fig. 12 .
In these boards, we have a separate reference plane at the input side of the board, which is connected to a separate analog pad . On this reference plane we then mount electrical input circuits, consisting of a transformer, a termination resistor, and a series resistor (100 k ), see Fig. 12 . The transformer is used for isolating the reference plane from the external pulse generator ground. We can also place a diode chip on the reference plane and connect to ASIC inputs. Two boards were used, one with 4 transformer inputs and one with one transformer and a photo diode with 64 strips. For analog and , we use 20 bond wires, and for digital and 10 bond wires. The clock LVDS input is terminated on the board, whereas the LVDS data ports are terminated in the chip. All digital data are connected to a flat cable contact. Clock and digital I/O is managed by a separate FPGA board with a computer interface over USB. In the computer Matlab [27] is used as driver.
All measurements below were performed at 100 MHz clock frequency and with a supply voltage of 1.55 V. Initially, we measured the current consumption to 430 mA of which 100 mA was for the digital part. We also checked reference voltage and DAC linearity. DAC linearity was somewhat worse than expected (see Section VII), so in most cases, measured data are corrected by a measured DAC calibration curve.
The 4 channels connected to transformers were used to measure gain and linearity. Here a calibrated pulse generator, with pulse length 10 ns and a repetition frequency of 1.003 MHz, was used as input to the transformer. In order to compensate for the transformer leakage inductance, a series inductance was connected to the input of the transformer. For the gain measurements, DAC0 is set to a low value (just above the noise level), DAC1 is swept, and DAC2-7 is set to a high value. We selected , (corresponding to a delay after pulse detection of 5 clock cycles, equals 50 ns at 100 MHz, see Fig. 5 ), and . For each pulse amplitude, we thus sweep DAC1 and then we fit the count values versus DAC1 voltage to a complementary error function and record the mean. The results of the gain measurements are shown in Fig. 13 . Here the input charge was converted into equivalent energy through the formula (5) where is the input charge ( , current pulse amplitude, pulse length) and is the ionization energy needed to create one electron-hole pair in silicon. We note a very linear behavior linearity better than 1.5% and a good homogeneity gain standard deviation of about 0.5% between the four channels measured. The measured average gain is 2.10 mV/keV.
Next we measured noise and offset of all channels for . This was done by setting , , allowing a new measurement each fourth clock cycle, and sweeping DAC0 and directly readout counter0 values after a fixed measurement time of 8.2 for each DAC0 value. We then plotted the normalized count value (normalized to its maximum value; all values are an average of 40 measurements) versus DAC0 voltage and fitted the curve to a complementary error function, in order to estimate mean value (offset) and standard deviation (noise), see Fig. 14 .
In Fig. 15 , we show the noise for all channels. Channels connected to diodes are separated from open channels. We note a noise level of about 1.62 mV RMS for channels connected to diodes and 1.45 mV RMS for open channels. With the measured gain above, this corresponds to 0.77 and 0.69 keV RMS, respectively, or to and electrons, respectively.
We also measured the offset spread between channels (spread of mean in Fig. 14) , see Fig. 16 . The spread is somewhat large (of the same order as the noise), see Section VII. The average temperature dependence of the offsets of the open inputs were 34 between 30 and 60 . The filter reset mechanism was verified in the following way. Without filter reset, two pulses close together will add, as shown in Fig. 8 . This was investigated by applying a double pulse to the input, sweeping DAC1 as explained above, and measuring the highest threshold still giving counts. In Fig. 17 , we show this threshold versus pulse distance. We note that for pulses closer than about 100 ns, the amplitude is severely disturbed. This agrees well with theoretical results for , and is therefore also a verification of the filter peak time. With filter reset, we performed a similar measurement. In Fig. 18 , we show the observed count value versus DAC1 voltages for three different cases. It is clearly shown that the observed threshold is not changed at a pulse distance of 60 ns and that we distinguish both pulses in this case (double pulse count is double 
VII. DISCUSSION
Results presented here are all from first silicon. Predicted performance is mainly met. The gain was measured to 2.1 mV/keV which is equal to the predicted value. Also peak times agree with predicted values as far as verified as in Fig. 17 . The measured noise level is on the average electrons for an open input and electrons with an input capacitance of about 3 pF. We estimate that the latter case corresponds to a total capacitance of about 5 pF, so 214 should be compared to the target 300 electrons. See Table II . Some issues have been detected. DAC linearity and DAC to DAC interaction are unsatisfactory. An analysis shows that this is caused by a wire with too high resistance, which is easy to correct in a new design. Offset variation between channels is somewhat too large. We believe this is related to the offset calibration mechanism (Section III-C), where switching noise may be picked up by the offset calibration switch in a partly stochastic manner (offset variation between channels appears non-systematic). Also the rate-dependent DC offset discussed in Section III-C is related to the offset calibration mechanism. We expect that both these problems can be solved by a modified offset calibration scheme.
VIII. CONCLUSION
We have presented the design and measurements of a highrate energy-resolving photon-counting ASIC with 160 channels aimed for spectral computed tomography. This ASIC implement 8 programmable energy bins with 8b counters in each, which to the authors' knowledge is the largest number of energy bins implemented in this class of read-out circuit. The thermal noise level is at the same level as other designs. The power consumption is close to the theoretical minimum, given by noise level and input capacitance. The maximum count rate and the maximum frame rate are to the authors' knowledge the highest reported for this class of read-out circuit. This in spite of a relatively large peak time of 40 ns is required to properly integrate all charge from a silicon detector. Finally, due to a new filter reset mechanism, we expect a minimum of spectrum distortion at high rates.
