A charge-coupled device ͑CCD͒ capable of 200 Mpixels/s readout has been designed and fabricated on thick, high-resistivity silicon. The CCDs, up to 600 m thick, are fully depleted, ensuring good infrared to x-ray detection efficiency, together with a small point spread function. High readout speed, with good analog performance, is obtained by the use of a large number of parallel output ports. A set of companion 16-channel custom readout integrated circuits, capable of 15 bits of dynamic range, is used to read out the CCD. A gate array-controlled back end data acquisition system frames and transfers images, as well as provides the CCD clocks.
I. INTRODUCTION
Over the past 20 years, the brightness of synchrotron radiation sources has increased by six orders of magnitude, whereas the detectors used in experiments are in many cases unchanged. One of the ubiquitous detectors in synchrotron radiation research is a fiber-coupled phosphor screen, read out by a charge-coupled device ͑CCD͒. CCDs have been the scientific imager of choice for more than 3 decades due to their noiseless, efficient charge transfer which allows linear recording of signals over a wide dynamic range. The primary drawback of CCDs is the serial nature of their charge transfer mechanism that tends to be slow.
Our goal was to improve this kind of detector by increasing the CCD readout speed by a factor of 100 while maintaining all of the other excellent characteristics of CCDs. At the same time, fabricating the CCD on thick, high-resistivity silicon opens up the possibility of direct x-ray detection ͑the CCD is thick enough to absorb essentially all x rays up to 10 keV͒ as well as improve the point spread function ͑PSF͒ ͑as the CCD is fully depleted, charge is collected solely by drift, without diffusion͒.
We describe below the design and fabrication of a 96 port CCD on thick, high-resistivity silicon, as well as a companion 16-channel custom readout integrated circuit and the data acquisition system. First results from tests with x rays are presented.
II. CCD DESIGN
Conventional CCDs achieve higher readout speed with increased clock rates and higher digitization frequency. The speed increase results in a corresponding increase in bandwidth, so that the readout noise increases as f 1/2 . With fixed well depth, this results in an f 1/2 decrease in dynamic range. An alternate way to increase speed is to increase the number of readout ports. In a single port CCD, the image matrix is surrounded by the parallel clock distribution interconnect on two sides and by the output ͑serial͒ shift register with a readout port on one side ͑see Fig. 1͒ . This is commonly extended to two ports by splitting the serial shift register, adding another output port to the other side, and reading half the CCD out of one port and the other half out of the other port. Duplicating this arrangement on the top of the CCD and splitting the parallel clocking directions results in a four port device. One way to further increase the number of readout ports would be to read out every column ͓column-parallel CCD ͑Ref. 1͔͒. When the pixel pitch is narrower than the width of the output stage, this introduces interconnection problems which are insurmountable in most CCD processes. Alternately, one could have many short serial shift registers ͓almost column-parallel CCD ͑Ref. 2͔͒, but this requires finding a way to intersperse the output stages into the area occupied by the serial shift register.
The approach we take is the latter: an output stage for every ten columns. In order to fit the output stage in the area of the serial shift register, each ten-column serial shift register is compressed by a short ͑6 pixels͒ pitch-adapting taper ͑see Fig. 2͒ . The pixels in the taper are designed to have the same capacitance ͑and therefore the same properties͒ as pixels in the imaging matrix. For reasons described below, the pixel pitch is 30 m, so that the output pitch is 300 m. The output stages share a common drain connection for every four outputs, so that together with a V DD pad, the resulting 240 m pad pitch allows simple wire bonding. In conventional scientific CCDs, reset and output V DD pads are not shared in order to minimize cross-talk, whereas for the high density of this CCD, individual power pads would lead to a prohibitive number of power supplies.
The CCD was fabricated in a process originally developed at LBNL, 3 where a conventional CCD structure is grown on a high-resistivity silicon wafer. The high resistivity allows the complete volume of the imaging matrix to be depleted so that all of the charge is collected and diffusion is minimized. In addition, the entrance window ͑backside͒ is specially designed 4 to present a minimum of dead material while maintaining good conductivity. This allows the same device to simultaneously be optically sensitive ͑with excellent blue and red quantum efficiencies͒, sensitive to x rays from the VUV to ϳ10 keV, and sensitive to low energy electrons.
The time needed to read out a CCD with N X N Y pixels having m output ports on the top and bottom each is T = ͑N Y / 2͓͒T V + ͑N X / m͒T H ͔, where T V is the parallel clock time and T H is the time to serially shift one column and digitize the output. In a three-phase CCD ͑like this one͒ three parallel clocks must be toggled during time T V , so that T V should be a small fraction of ͑N X / m͒T H in order to not introduce significant dead time. Unlike the serial ͑mini͒shift registers, the parallel clock gates traverse the width of the CCD. The polysilicon gate resistivity is three orders of magnitude higher than that of aluminum. For normal ͑slow͒ CCD readout, this is not a concern, but at high speeds the RC time constant ͑R from the polysilicon gate resistivity and C from the combination of gate overlap and channel capacitance͒ can make T V prohibitively long. For this reason, the gates have been metal strapped, whereby ͑as shown in Fig. 3͒ each polysilicon gate is covered with metal, and contacted every second channel stop. The lithographic tolerances needed for the contact and metal etching steps result in a minimum metal-strapped pixel pitch of about 25 m.
The CCD lot was fabricated on high-resistivity 6 in. wafers, 675 m thick by Dalsa Semiconductor. Dalsa provides the front-end processing ͑implants, polysilicon gates͒, after which thinning, backside processing, and metallization are performed at the LBNL Microsystems Laboratory. The backside processing involves high temperatures so the aluminum metallization must be performed last. A few control wafers were fully processed by Dalsa ͑i.e., including metallization͒. Those wafers can only be frontside illuminated and are 675 m thick. CCDs processed at LBNL can be front side or back side illuminated and for this lot were thinned to 200 m.
III. CCD SIGNAL PROCESSING INTEGRATED CIRCUIT
The high density of analog outputs from the CCD makes discrete readout impractical. We have therefore developed a custom integrated circuit, fCRIC, to acquire and digitize the CCD signals. The fCRIC is based upon a floating-point architecture developed at LBNL ͑Ref. 5͒ and is shown in feedback capacitance is C. If the integrator output exceeds a preset threshold, then capacitors 3C are switched in. The feedback capacitance now goes from C to 4C, which means that the gain has been reduced by a factor of 4. Similarly, if the integrator output again exceeds the preset threshold, then capacitors 4C are switched on. With a feedback capacitance of 8C defined as unity gain, the three effective gains are 8, 2, and 1. As shown in the timing diagram in Fig. 5 , first the reset level of the CCD is integrated. The sign of the integration is now changed, and the signal level is integrated. This subtraction of the reset level from the signal level is known as correlated double sampling and is used in CCDs to reduce low frequency noise. The analog result, signal reset, is then digitized by a 12 bit pipelined analog to digital converter ͑ADC͒. The digital result consists of the ADC mantissa and 2 bits representing the gain ͑1, 2, or 8͒. The overall gain has been set so that for a nominal 400 ns signal integration time, 0.5 V at the input of the fCRIC corresponds to the full scale of the ADC. With a typical conversion gain of 3.5 V / e − , full scale corresponds to the 2 17 e − on the gain 1 scale, 2 16 e − on the gain 2 scale, and 2 14 e − on the gain 8 scale. With 12 ADC bits, one analog to digital unit ͑ADU͒ thus corresponds to 32e − , 16e − , or 4e − .
The fCRIC contains 16 identical channels sharing a common digital back end. The digital circuitry includes a command decoder and four, serial data output lines, circuitry to assemble the pipelined ADC output into data words, circuitry to self-calibrate the ADCs, and a digital timing generator. The timing generator consists of a number of 8 bit counters, incrementing on a master clock of up to 250 MHz. Each counter controls the transition of the internal timing signals used in the analog signal acquisition and digitization. The 16 channels also share common analog services, including a bandgap voltage reference. The fCRIC was designed and fabricated in commercial 0.25 m complementary metal oxide semiconductor and measures 4.8ϫ 8.3 mm 2 .
IV. DIGITAL READOUT AND TIMING CONTROL
The back end electronics, shown in Fig. 6 , control the digital readout and the timing of the detector. It consists of an interface module, two data modules, and a clock module. The data from the six fCRICs flow into the two data modules where they are buffered and organized. From the data modules the data are sent to the interface module, which buffers them again and then sends them out a camera link port. The camera link data are received by a Dalsa X64-CL full camera link frame grabber, which is plugged into a PCI 64 slot in a computer. The Dalsa frame grabber is capable of acquisition rates up to 680 Mbits/s and is controlled through a user interface that makes calls to Dalsa's application-programming interface.
The functions of the interface module are to generate all of the CCD clock signals, to provide a method of synchronizing external equipment to the detector, to receive data from the two data modules, and to send the CCD data out to a camera link interface. The logic in the interface module, controlling the CCD clock signals, allows developers to modify many parameters without having to reprogram the field programmable gate array ͑FPGA͒. For example, one can change the serial cycle time, the number of pixels read out, the shape of the waveforms, and the clock voltages all from a graphical user interface ͑GUI͒ without reprogramming the FPGA.
The functions of the data modules are to receive 12 low voltage differential signaling ͑LVDS͒ data streams from three fCRICs, provide a serial bus to program the fCRICs, provide the digital power to the fCRICs, and allow some real time data manipulations. One real time data manipulation is the rearrangement of the data from three fCRICs to match the CCD geometry before sending them to the interface module. The logic implements this descrambling with a ping- 
083302-3
Denes et al. Rev. Sci. Instrum. 80, 083302 ͑2009͒ pong line buffer: one buffer receives data from the fCRICs while the other buffer is transferring data to the interface module.
The functions of the clock module are to convert the CCD clocks to programmable voltage levels, provide enough current to the clock signals to drive the CCD, provide programmable bias voltages to the CCD, and provide analog power to the fCRICs. The CCD clock signals come from the interface module and are optically isolated from the digital power supply. All of the programmable voltages are set through a serial bus from the interface module and can be modified through the GUI.
V. RESULTS
A prototype CCD with 480ϫ 480, 30 m pixels was fabricated, along with various test devices, including a four port version ͑20ϫ 480 pixels͒ for use with conventional CCD readout systems. The prototype CCD implements several new design features compared to previous LBNL CCDs: a large number of output stages-48 on each side; use of the constant-area taper, metal strapping, and large ͑30 m͒ pixels. A front-illuminated four port version was characterized first with slow ͑100 kHz digitization rate͒ readout in order to assess the performance of the large, metal-strapped pixels and to compare the behavior to previous LBNL CCDs. As the devices are 675 m thick, there is considerable bulk thermal leakage current, so that measurements must be performed at low temperatures. With 55 Fe, the conversion gain was measured at 140 K to be 3.3 V / e − , consistent with expectations. At that temperature, long exposures gave a leakage current of 5e − / h / pixel, or 2.5ϫ 10 −17 A / cm 2 . Backside illuminated CCDs thinned to 200 m have also been characterized on ALS Beamline 5.3.1 ͑Ref. 6͒ with fluorescence x rays from thin foils. For all of the subsequent measurements, the CCD was read out at 5 ms/frame. Figure  7 shows a histogram of the gain for each output stage for Ag fluorescence photons ͑22 keV͒. At 500 mV full scale on the ϫ1 gain range, the average peak value of 910 ADU on the ϫ8 gain range corresponds to a conversion gain of about 2.3 V / e − . This conversion gain is about 1/3 lower than that measured for low-speed operation due to the settling time of the output stage. The single-stage output source follower is biased at 1 mA and has a settling time approaching 1 s. At low speed, the voltage signal to be integrated is fully settled while integration takes place, but at high speed, it is not, resulting in a lower conversion gain. The output gain uniformity is quite good, as seen in Fig. 7 , with a standard deviation of ϳ3% ͑which includes the tolerance of the source follower bias resistors, and hence the gain of the output transistor͒. This demonstrates good matching across the CCD chip.
The spectrum for Ag fluorescence photons, corrected for the output stage gains, is shown in Fig. 8 . The measured resolution, 250 eV, is currently read noise limited and will be improved in the next implementation of the readout circuit board. Although the Gaussian PSF for 200 m fully depleted silicon is around 5 m, which seems negligible compared to a 30 m pixel, it is easy to show that for uniform illumination, a 30 m pixel contains only 75% of the total charge for a 5 m PSF. This can be seen in Fig. 9 , which shows the energy-ordered sum, S͑n͒ = ͚ i=1 n P i , where P i is the energy recorded in the ith pixel and P i Ͼ P i+1 . The progression seen in Fig. 9 is consistent with a 5 m Gaussian PSF. Lastly, different depths of conversion lead to subtle differ- ences in conversion gain. Figure 10 shows the measured conversion gains as a function of energy, and therefore different average conversion depths.
VI. CONCLUSIONS
A custom CCD, capable of high-speed readout due to multiple output ports, has been fabricated on thick, highresistivity silicon. This CCD, in addition to high-speed readout for optical photons, has excellent efficiency for direct x-ray detection. To take full advantage of the high performance, a 16-channel custom readout integrated circuit was developed in parallel with the CCD. The complete system includes a digital back end control, clocking, and data acquisition system, capable of acquiring images at 200 frames/s.
The CCD described here was originally developed for optical imaging, to read out, for example, a fiber-coupled phosphor x-ray detector. It is clear, however, that there are several applications of this CCD in direct x-ray detection, such as an imaging energy-measuring ͑spectroscopic͒ detector. We will therefore optimize the readout system for direct x-ray detection in a future iteration: the well depth for x rays is significantly less than for optical photons, so a future version will emphasize higher readout speed at the expense of reduced ADC resolution. For applications requiring larger areas, along with an electronic shutter, a 1000ϫ 2000 version of this CCD will be fabricated.
ACKNOWLEDGMENTS
We gratefully acknowledge the contributions from Matthew Church for the design of the camera mechanics, Rossana Cambie for the design and processing of the silicon substrate holding the CCD and fCRICs, and Jacque Wycoff, Rhonda Whitharm, and John Eames for assembling the substrates. Tim Madden and Antonino Miceli helped in the design of the clock module and detector characterization. Robert Abiad, Geroge Chao, Dario Gnani, and Brad Krieger were part of the fCRIC design team. We were aided in laboratory testing by Bill Kolbe and Armin Karcher and at ALS Beamline 5.3.1 by Rich Celestre. Steve Holland's profound knowledge and expertise in CCDs was essential for the design of the CCD presented here, and Chris Bebek provided both guidance and a fruitful link to the astronomical CCD community.
