Abstract-A digitizer designed to read out column-parallel charge-coupled devices used for high-speed X-ray imaging is presented. The digitizer is included as part of the High-Speed Image Preprocessor with Oversampling integrated circuit. The digitizer module comprises a multiplexed, oversampling, 12-bit, 80 MS/s pipelined analog-to-digital converter and a bank of four fast-settling sample-and-hold amplifiers to instrument four analog channels. The analog-to-digital converter multiplexes and oversamples to reduce its area to allow integration that is pitch-matched to the columns of the imager. Novel design techniques are used to enable oversampling and multiplexing with a reduced power penalty. The analog-to-digital converter exhibits 188 V-rms noise which is less than 1 LSB at a 12-bit level. The prototype is implemented in a commercially available 65-nm CMOS process. The digitizer will be applied to the development of a proof-of-principle 2D, 10 Gigapixel/s X-ray detector.
I. INTRODUCTION

E
MERGING particle detection and scientific imaging applications require greatly increased readout rates to enable the capture of dynamic processes. Future X-ray light sources, for example, will require instrumentation that digitizes at pixel rates in excess of 100 Gpixel/s [1] . A significant research and development effort is underway to develop application-appropriate solid-state imagers and their associated readout electronics. The circuit described leverages 65 nm technology and novel design techniques to enable a column rate of 10 MHz, corresponding to a system pixel rate of 10 Gpixel/second (10 000 fps for a 1 Mpixel square sensor) [2] . The circuit is envisioned for use as a platform for the accelerated development of detector readout ICs. Its first application is in the High-Speed Image Preprocessor with Oversampling (HIPPO), a highly integrated column-parallel Charge-Coupled Device (CCD) readout Application Specific Integrated Circuit (ASIC) whose system aspects have been described [3] . This paper is organized as follows: Section II presents the digitizer requirements, Section III describes the circuit architecture, and circuit testing is discussed in Section IV.
II. CCD DIGITIZER REQUIREMENTS
In order to reduce system cost and complexity, the ASIC readout channels are required to match the 50-m column pitch of a column-parallel, thick, fully depleted X-ray CCD prototype. To provide sufficient dynamic range, the digitizer is required to combine 12-bit resolution and noise with 10-bit accuracy. Traditionally, pipelined ADCs are specified for enhanced linearity relative to their noise performance because of the significant linearity requirements for digitizers used in digital communications. In scientific imaging, however, noise is often more important than linearity when the imager is deployed in situations where it is not expected to simultaneously image bright and low-light areas. The 80 MS/s sampling rate requirement is due to multiplexing and oversampling, as discussed below. The requirements are summarized in Table I . The full-scale range is chosen for compatibility with 65-nm CMOS technology and the serial output is specified by a desire to reduce output pin count.
A block diagram of a possible application system for HIPPO is shown in Fig. 1 . The CCD control Field Programmable Gate Array (FPGA) generates multi-phase digital clocks for the clock driver chip. These digital clocks are converted in the clock driver chip to high-voltage clocks suitable for driving the CCD. The CCD control FPGA also generates a master clock and convert signal for the HIPPO ASIC in order to synchronize the data transfer across the system. The readout FPGA receives 0018-9499/$31.00 © 2012 IEEE the data from the HIPPO ASIC. An analog timing monitor output from the HIPPO ASIC is used to provide a fine control to the CCD clocks to improve their synchronization with the HIPPO clocks. Both the CCD control FPGA and the readout FPGA share a common crystal-derived time base.
III. CIRCUIT ARCHITECTURE
A block diagram of a HIPPO IC module is shown in Fig. 2 . The implemented ASIC comprises 16 channels which are organized as four distinct 4-channel modules. Each readout module consists of four preamplifiers, four correlated double-sampling (CDS) circuits, four sample-and-hold amplifiers (SHAs), and a multiplexed pipelined ADC. The ADC is multiplexed between the four channels to enable channel pitch matching with the 50-m CCD columns. The ADC data are serialized to save pin count and are sent off chip at 480 Mb/s.
A block diagram of the ADC is shown in Fig. 3 . The ADC consists of 11 identical stages each containing an analog-todigital subconverter (ADSC), a digital-to-analog subconverter (DASC) and an interstage amplifier. The ADC uses the Redundant Signed Digit algorithm to relax the requirements of many of the analog components [4] . Each stage processes the quantization error of the previous stage, and each stage provides one effective bit (two for stage 11) after redundancy elimination. The redundancy provided by the Redundant Signed Digit algorithm allows the accuracy of the ADC to be determined almost entirely by the accuracy of the interstage amplifier because a three-level coarse DAC is inherently linear in a differential sense and large offsets can be tolerated in the coarse ADC [4] . In the prototype the stages are not scaled in order to reduce design time. In future versions the downstream stages can be scaled to reduce power consumption and area without compromising performance.
In a pipelined ADC, the closed-loop gain of the first stage interstage amplifier must be at least as accurate as the resolution remaining after the first stage [5] . In the ADC presented, the first stage resolves one effective bit and the linearity specification for the total ADC is 10 bits. Therefore, the interstage amplifier in the first stage of the ADC must be at least 9-bit accurate. Given the feedback factor of the closed-loop amplifier, this translates to an open-loop gain requirement of approximately 60 dB for the Operational Transconductance Amplifier (OTA) used to implement the interstage amplifier. In the prototype, all OTAs use the fully differential folded-cascode structure shown in Fig. 4 . In 65 nm CMOS, achieving high open-loop gain in one stage is challenging because the gain of the individual MOS devices is less than 10 when small devices biased in moderate inversion are used [6] . The open-loop gain cannot be easily increased by using multiple levels of cascoding because of the headroom constraints imposed by the low power supply voltage required by the 65 nm process. To overcome this gain limitation, gain boosting is used in the OTA to regulate the voltage across the cascode devices in order to increase the OTA output resistance [7] . Gain boosting increases the open-loop gain of the OTA by a factor of approximately the gain of the gain-boosting amplifier. The expense of gain boosting is increased power dissipation and more challenging frequency compensation due to the presence of a pole-zero doublet in the transfer function [8] .
To meet the required noise performance in a reduced silicon area, oversampling is employed. Oversampling entails averaging multiple samples of a signal. Because the signal components of the samples are strongly correlated and the white noise components are uncorrelated, oversampling increases the signal-to-noise ratio of the signal by a factor equal to the square root of the oversampling ratio. A plot of the efficacy of oversampling in reducing the input referred noise of a Nyquist-rate ADC is shown in Fig. 5 . The ADC Noise Reduction (NR) in the plot is defined here as the improvement in SNR, expressed in bits, due to the oversampling, or (1) where OSR is the oversampling ratio. Identical performance could be achieved without oversampling by increasing the sizes of the capacitors in the interstage amplifier but at the expense of increased die area. The noise of the ADC is dominated by thermal noise because of the sampling rate so 1/f noise is not considered.
Significant improvements in noise performance are possible with oversampling. Oversampling is only feasible when the input bandwidth is significantly below the achievable sampling rate of the technology. To generate the required timing to implement oversampling, the HIPPO digitizer is flexibly clocked using the circuit in Fig. 6 . A master clock is sent down independent delay lines that are then used to select different frequencies for the ADC clock and the channel clock under digital control.
The channel clock is the digital clock that controls the front-end circuits of the system [2] such as the preamp and the correlated double-sampler. The ADC clock drives the analog-to-digital converter. The channel clock and the ADC clock can be adjusted separately to give an oversampling ratio of between 1 (no oversampling) to 32 (maximum oversampling). The key penalties in oversampling using a Nyquist-rate ADC are the reduction in the system input bandwidth and the increased power dissipation for a given sampling throughput.
There are at least two reasons why oversampling a multiplexed input signal that is not valid throughout the ADC sampling period is challenging. First, it requires a number of SHAs equal to the number of channels, increasing die area and power dissipation. Second, it requires a SHA fast enough to provide multiple samples of the valid period. In Fig. 2 , for example, the SHA input is valid for approximately 10 ns when the channel is operated at 5 MHz. Oversampling this SHA by a factor of four would require it to settle to the desired accuracy in less than 2.5 ns. To relax the SHA requirements while still allowing low-noise oversampling we have developed a sampling method that adds less noise to the signal than would an analog delay line at a given power dissipation. Oversampling the front-end output directly would require an extremely fast ADC that would dissipate a large amount of power. Instead, a bank of fast-settling SHAs is used to allow for uniform ADC sampling without additional noise, since the bank of SHAs would replace the ADC input SHAs required to implement multiplexing. The sampling is uniform in the sense the ADC is sampling periodically. To reduce power, the SHAs can taper their bias currents based on the order in which they are sampled. This increases the settling time of each tapered SHA but has no effect on accuracy as long as the SHA is settled to the required accuracy before sampling, as shown in Fig. 7 . Although the rise times of the SHAs in Fig. 7 are quite different the system accuracy is unchanged by bias tapering because all the SHAs have settled before sampling. The power is reduced because the SHAs no longer consume power to settle quickly when they are not scheduled for sampling for some time.
The potential for system-level power reduction that can be achieved through the use of bias tapering can be seen in Fig. 8 . The relative power dissipation compared to the power dissipation in which no channels are bias tapered is (2) where RP is the relative power dissipation and n is the number of parallel channels.
The number of parallel channels here is the number of channels receiving bias tapering. In the prototype, each 4-channel module is bias tapered, which reduces the power dissipated in the SHA bank by 37.5% relative to the case when all SHAs settle with the same time constant (i.e., when the SHAs are not bias tapered).
Even with the introduction of the bias tapering technique, the SHAs still require fast settling and consume a significant amount of power. To enable the fast settling of the SHAs at acceptable power dissipation, the flip-around architecture shown in Fig. 9 is used to maximize the SHA feedback factor [4] . The flip-around SHA uses the same capacitor for acquisition and regeneration of the input signal, and therefore has a feedback factor that is substantially higher than a traditional switched-capacitor amplifier. To eliminate signal-dependent offsets, the SHA uses the bottom-plate sampling technique [9] . The input common-mode shorting switches clocked by are turned off slightly before the input switches. Therefore, charge injection from the shorting switch is a common-mode signal to first order. In a differential configuration this charge injection is canceled and does not affect the linearity of the circuit. The SHA is clocked such that it tracks the analog front-end output during settling and can sample without spending additional time to acquire the signal. The output common-mode level of the SHA is stabilized by a switched-capacitor based common-mode feedback circuit [9] .
To minimize design time, the OTA that was designed to meet the requirements for the first stage of the ADC was reused to implement the SHA.
A block diagram of the serializer used in the HIPPO ASIC is shown in Fig. 10 . Because the data rate is limited to 480 Mb/s/port, two serializers are used to output data when the ADC is converting at 80 MS/s. The serializer operates as follows. First, the word to be serialized is latched by a 40-MHz word clock and then introduced into two parallel shift registers. Next, these shift registers are clocked using a 240-MHz bit clock. Finally, the outputs of these shift registers are then multiplexed into an LVDS driver. The bit clock is used to drive the multiplexer so the output bit changes on both the rising edge and the falling edge of the bit clock, giving a data rate of twice the bit clock frequency, or 480 Mb/s. The datapath consists of manually routed standard cells, while the serializer control state machine is constructed with synthesized logic.
A die photomicrograph of the HIPPO ASIC is shown in Fig. 11 . To maintain a 50-m channel pitch, the inputs are bonded to staggered pads. Including a test channel and bond pads, the IC is 4.2 mm by 1 mm in size.
IV. CIRCUIT TESTING
The HIPPO ASIC was tested using a custom testboard and a control board based on a commercial FPGA evaluation board. The FPGA evaluation board generates the master system clock and receives and decodes the LVDS outputs of the HIPPO IC. The FPGA also programs the HIPPO configuration register for settings such as channel rate, ADC rate, test-mode status, and fine tuning for the timing generator.
A close-up photograph of the HIPPO test board is shown in Fig. 12 . The die is direct-to-board bonded.
The primary design challenge for the HIPPO ADC was its specification for low-noise operation. To test the noise performance of the ADC, the input was set at a dc level of 0.25 V and continuously converted. The data are plotted in Fig. 13 . A Gaussian distribution that is fitted to the data is also plotted. The spread of the output codes then can be interpreted as an indication of the input-referred noise of the ADC. The distribution is skewed because the input voltage was not set precisely at the center of an ADC code. HIPPO was specified for noise less than 200 V-rms. With a full-scale voltage range of 1 V pp-diff, this equates to less than an LSB at a 12-bit level. The rms noise is 188 V-rms, or approximately 0.77 LSB.
The measured static linearity of the ADC using a 10-bit LSB is shown in Fig. 14 .
The differential nonlinearity (DNL) is near the design target of LSB. The integral nonlinearity (INL) plot indicates insufficient closed-loop gain and/or incomplete interstage amplifier settling in the ADC. These effects have reduced the closedloop gain enough that the ADC cannot achieve full 10-bit accuracy from an INL standpoint, while it does achieve 12-bit noise performance. Table II whows the measured performance of the ADC.
V. CONCLUSION
An integrated digitizer suitable for the instrumentation of column-parallel CCDs is presented. The digitizer combines 10-bit differential linearity with 12-bit noise performance and operates up to 80 MS/s. Combined with a charge-sensitive analog front end in the HIPPO ASIC, the digitizer is an enabling device that is a required step towards a proof-of-principle 2D, 10 Gigapixel/s X-ray detector.
