I. INTRODUCTION S PROGRESS in solid-state image sensor technol-
A ogy has increased sensor array size, image processing systems must contend with ever increasing data bandwidths. For example, an HDTV image captured at a 30-Hz frame rate (1100 X 1900 pixels X 30 frames/s) represents a 62.7-Mpixel/s data rate, exceeding the capabilities of many current image processing systems. In particular, image compression systems are often required to ease the transmission and storage requirements for such data. Lossless image compression is important in medical and scientific applications. In this work, a hierarchical (pyramid) lossless compression algorithm has been adopted. Compression performance can be improved by encoding the intensity difference between adjacent pixels prior to compression [I] . Typical of many image processing tasks, the hierarchical difference encoding algorithm operates on 3 x 3-pixel blocks (kernels), yet conventional imagers are read out in sequential raster-scan format such that vertical neighbors are separated by a full row of pixels. Thus, buffering of the image data must be performed by digital memory followed by subsequent access of the pixels in hierarchical order. Although not overly complex, buffer memory and digital image reorganization circuitry places additional power, weight, and size burden on the transmission electronics system-an important consideration in a scientific spacecraft, for example. An alternative approach to image reorganization is to preprocess the image data during imager readout utilizing low-power compact circuitry integrated on the sensor array chip. Such integration of image acquisition and image processing circuitry is termed focal-plane image processing and can be used to reduce the performance requirements of downstream processing electronics [2] . This paper describes four CCD image reorganization IC's (as well as a separate CCD imager) that supply downstream image compression electronics with their required data sequences (3 x 3-pixel neighborhoods with the center pixel first) at video rates. Several different approaches to such neighborhood reconstruction providing both row and pixel reorganization have been implemented on the four IC's. Two image reorganization processors are integrated with a 256 x 256 CCD frame-transfer image sensor array on one IC and two are designed for hybridization to a separate 256 x 256 CCD imager within a 68-pin package.
The compression algorithm is described in Section 11. Design and operation of the image reorganization circuits for the two approaches are discussed in Section 111. Finally, experimental results from the four IC's are presented in Section IV.
ALGORITHM
The lossless image compression algorithm utilizes a pyramid-structured progressive transmission technique [3] that exploits the spatial correlation between nearest neigh-0018-9200/92$03.00 0 1992 IEEE bors in an image. The pyramid consists of subsampled arrays of the original image, in which each pyramid layer is comprised of the center pixels of every 3 x 3 neighborhood block from the level below it. Within each level, differences are formed between a center pixel and its surrounding eight neighbors as shown in Fig. 1 . These differences form the lossless hierarchical code, which is then sent to a variable length coder (e.g., Huffman) for compression. Simulation has shown significant improvement in compression performance over raw image data compression (see Table I ). Simulation has also revealed that the majority of the compression (80-86%) occurs in the lowest level (pyramid base, raw image level) of hierarchy, that is by encoding differences between adjacent pixels in each local neighborhood within the image. To facilitate implementation, the current approach enables encoding of only the base layer.
111. IMPLEMENTATION Two architectures were utilized in the implementation of the image reorganization IC's: focal-plane integration and hybrid. In the focal-plane architecture, a 256 x 256 buried-channel frame-transfer image sensor is integrated with the image reorganization circuitry. In the hybrid approach, a separate image reorganization IC inputs a conventional raster scan data stream from a CCD imager and outputs a reformatted pixel stream. In this work, the CCD image sensor and image reorganization IC's were designed for wire bonding into a single 68-pin package. However, the hybrid image reorganization IC's can also input data from commercial CCD imagers and so be readily integrated into existing image processing systems.
The IC's are implemented using a triple-poly doublemetal 3-pm buried n-channel CCD process. Three-phase clocking is used in the CCD registers. Pixel size is 15 pm x 15 pm, including pixel isolation accomplished through a 3-pm channel stop. All the CCD structures are symmetrical, allowing bidirectional charge transfer both horizontally within the serial registers, and between the serial and parallel registers. In the focal-plane IC's, the image and frame store sections occupy 3.9 mm X 7.74 mm, with the neighborhood reconstruction circuitry occupying an additional 2 % of IC area, or 0.61 mm2. The image reformatting circuitry performs two main functions: multiple-row readout and pixel resequencing . Multiple row readout refers to the reformatting of image data to provide simultaneous access to several rows of data (e.g., three for a 3 x 3 kernel). Pixel resequencing refers to the reordering of pixels to create a desired pixel sequence within a local neighborhood or row. Design and operation of the two types of IC's will now be described.
A . Focal-Plane Approach
In the focal-plane approach, the neighborhood reconstruction circuitry is integrated with a frame transfer CCD image sensor. The two focal-plane IC's consist of five major portions as shown in Fig. 2 . The image sensor is a 256 X 256 three-phase buried n-channel CCD adjacent to a 256 X 256 storage array. A novel multiple row readout structure appended to the storage array replaces the serial output multiplexer of conventional CCD imagers. This neighborhood reconstruction circuit delivers three lines of pixel data simultaneously to the pixel resequencing block. The pixel resequencer then separates the three rows into 3 X 3 blocks of pixels and outputs a serial stream of nine-pixel blocks. The final section is the sampling output block which separates the center pixel from its eight surrounding neighbors, and provides sequential, differential output and off-chip drive capability. The two IC's differ only with regard to the pixel resequencing block.
IC operation is as follows (see Fig. 2 ). The image data are moved to the frame storage section following frame integration by rapidly transferring the charge in the parallel registers, as in typical frame-transfer imagers. Following frame transfer, three lines of the image are loaded into the neighborhood reconstruction registers by continued vertical parallel transfer. The three lines are then shifted horizontally by applying a channel stop bias to the vertical transfer gates, providing three rows of data simultaneously to the pixel resequencing block.
The novel NR architecture which allows both the vertical and horizontal flow of charge is shown schematically in Fig. 3 , and is referred to as SP3 due to the three serial/ parallel transfer structures. Unlike previously reported output structures, which utilize multiple serial registers, the new SP3 structure does not require multiplexing of a single row into the multiple registers [4] , [ 5 ] , nor does it require additional implant steps [6] . Contact to the fully symmetric horizontal shift registers' poly electrodes is made from metal bus lines running over the SP3 structure as shown in the photograph of Fig. 4 . Although three serial/parallel transfer structures were required to generate the requisite 3 x 3 windows in this particular application, the same structure may be repeated N times to generate an N X M window.
In addition to providing simultaneous readout of mulincorporation of a parallel diffusion or "dump drain" at the bottom of each vertical column for quickly clearing charge out of the image sensor. In this case, lines of image data are vertically clocked through the parallel and SP3 registers to the reverse-biased dump drains (see metal bus line at bottom of Fig. 4 ). This feature can be advantageous in scientific imaging applications, where the rapid clearing of an image frame will allow capture of a more interesting transient event.
The next step in the generation of the 3 X 3 encoding windows, following multiple-row readout from the SP3 structure, is separating the three rows into a serial stream of nine-pixel neighborhoods with the center pixel first. The two focal-plane IC's implement such pixel resequencing with two different techniques: pixel delay and wire transfer. The pixel delay technique relies on buffering of the pixels. Three-, six-, and nine-pixel registers are appended to the end of each of the three 256-stage SP3 registers as shown in Fig. 5 . Three pixels from each of the long serial shift registers are loaded into each of the three short registers. As the image packets are simultaneously clocked through the appended registers, the output sampling block receives the three center row pixels first, followed by the bottom and then top row. Therefore, the output block receives the center pixel second, rather than first as required by the difference encoder. Although the output sampling block interchanges the order of the first and second pixel, providing the difference encoder with the proper sequence, a delay of one pixel per nine-pixel neighborhood is introduced. The wire-transfer technique, discussed next, overcomes this limitation.
The second approach to pixel resequencing uses the technique, of wire transfer [7] , which combines elements of both bucket brigade and CCD devices to effect the reordering of the pixels. In this technique charge packets are transferred across wires, allowing the crossing of signal paths. quencing block. A wire-transfer structure is appended to the end of each of the three 256-stage serial SP3 registers with the center and bottom row wires interchanged, such that the center row pixels are output first. Each 3 x 3 pixel block is wire transferred into a 12-stage SP3 register which receives the packets in parallel (three at a time). The first three packets are transferred in and serially shifted up. The second (central) set of three packets are then wire transferred in and these three packets along with the first three are shifted down, such that the first stage of the 12-stage shift register contains the central packet. The final three packets are loaded into the 12-stage SP3 register and the nine-pixel neighborhood is transferred in parallel to a conventional parallel-to-serial nine-stage CCD register for serial output. While the nine pixels are being transferred out of the conventional register, the subsequent nine-pixel neighborhood is reordered. In this way, a continuous output data stream of 3 x 3 neighborhood blocks with the center pixel first is generated. Following neighborhood reorganization, the serial pixel stream is loaded into the output sampling block. This block, used in all four IC's, consists of a first-stage source-follower output amplifier followed by dual sample-and-hold (S/H) circuits (Fig. 7) . The center pixel data are sampled by the upper S/H circuit. The remaining eight peripheral pixels are sampled by the lower S/H circuit. The S/H circuits are buffered by a matched pair of sourcefollowers with active load transistors, which in turn drive the output pads. Thus, two output data streams are generated. The upper stream contains peripheral pixels output at the imager readout rate (2 Mpixells), while the lower contains center pixels output at 1 /8 the readout rate. By inputting these streams into a differential operational amplifier, differences between the center pixel voltage and its corresponding peripheral pixel voltages were easily generated at the imager readout rate (2 MHz or 26 frames /s) .
B. Hybrid Approach
In the hybrid approach, the image reorganization circuitry requires conventional raster-scan data input from a separate imager IC. The hybrid IC's, functioning as image reformatting "black boxes," can therefore be incorporated into image processing systems which utilize conventional front-end image acquisition, such as a CCD video camera. The hybrid IC's perform the same three functions as the focal-plane reorganization processors: simultaneous multiple-row readout, pixel resequencing, and differential output sampling. Analogous to the focal-plane IC's, the two hybrid IC's differ only with regard to the pixel resequencing technique utilized: pixel delay or wire transfer. The pixel resequencing delay structure is the same as that utilized in the focal-plane approach. However, a different wire-transfer architecture was utilized in the hybrid case and will be discussed below.
The hybrid multiple-row readout technique relies on buffering or delay of the image data. Buffering using a CCD delay line has been previously demonstrated both
IEEE JOURNAL OF SOLID-STATE CIRCUITS. VOL. 27, NO. 3, MARCH 1992
on-chip [8] and off-chip [9] . Although simple to implement, these approaches suffer from additional nonuniform charge-transfer inefficiency (CTI) loss due to transfer through a varying number of additional CCD stages, resulting in image degradation.
An improved version of the delay approach, schematically illustrated in Fig. 8 , has been implemented in the hybrid IC's. In this case, three rows of imager data are sequentially written into three separate 256-stage CCD shift-register delay lines which are jointly clocked to provide simultaneous readout of the three imager rows. After each row (line of image data) is written, the line is transferred into a parallel register for storage during writing of the other rows. This allows one set of clocks to be used for all three serial registers, minimizing pin-count and clocking complexity. After the three rows are written, the charge packets are simultaneously transferred back into the serial delay lines where they are clocked out horizontally.
In this way, each pixel undergoes the same number of charge transfers (one row) maintaining uniform CTI losses. In addition, the number of shift registers can be increased to provide access to any number of rows such that much larger neighborhoods (image kernels) can be created.
Writing of the data is accomplished via the fill-and-spill technique [ 101. The separate imager IC is conventionally read out via a source-follower amplifier, which converts the pixel charge packets to the voltage domain. These signal voltages are used to set the voltage on the inverting gate ( W l ) of a surface-channel fill-and-spill structure (to maximize linearity) located at the beginning of each shift register (Fig. 8) . By momentarily forward biasing the reverse-biased input diode (ID), one signal charge packet (Q,) is replicated during each pixel readout cycle: Q, = ACox(Vwl -Vw2), where A is the gate area, Cox is the oxide capacitance, and V,, and VM2 are the voltages on the first and second metering well, respectively. By proper scaling of the gate area and application of the signal voltage to metering well one, the attenuation and inversion introduced by the imager readout amplifier are canceled. Two of the three input diodes are reverse biased with respect to the channel, while the third is forward biased, so that pixel replication occurs in only the selected shift register, eliminating the need for additional select switch circuitry.
This buffering technique can be extended to provide access to every 3 x 3 neighborhood combination (as opposed to 3 x 3 window blocks), if each image pixel is replicated in nine (rather than one) different shift registers. Such a general-purpose image reorganization IC was also fabricated and tested during the course of this work but is reported elsewhere [l 11. The same technique can be extended to the generation of larger windows.
Once the three rows of data are made available, they are simultaneously input to the pixel resequencing block. One hybrid IC utilizes the pixel delay technique described in Section III-A for separation of the three rows of pixels into 3 x 3 blocks. The second IC utilizes the wire-transfer technique. Fig. 9 contains a photograph of the wiretransfer structure. Operation is as follows. Three pixel charge packets from each of the three 256-stage serial delay lines are loaded into three sets of short parallel registers. These nine pixels are then simultaneously transferred across wires to a nine-channel parallel register. The center pixel is wired to the first channel of the register, providing the desired resequencing. The reordered ninepixel neighborhood is then transferred into a serial register for sequential readout. During the nine-pixel serial readout, the subsequent 3 x 3 neighborhood is loaded and reordered by the wire-transfer structure such that a continuous output stream is generated. This stream is then sent to the previously discussed sampling block for differential output off-chip.
IV. EXPERIMENTAL RESULTS The IC's were tested both electrically and optically. The imaging and processing circuitry was operated with 5-V three-phase clocks, yielding a total estimated dissipated power of 150 pW at a 30-Hz frame rate, not including the off-chip drive amplifiers. These add an estimated 7 mW of power since they were designed to drive an oscillo-scope directly (1-Ma 22-pF load), but in principle need only drive an A/D converter.
A . Electrical
The circuits were tested electrically at the wafer-probe and chip level at a 277-kpixels/s and at a 2-Mpixels/s output rate, respectively. Quantitative testing on the two focal-plane IC's was facilitated by the addition of a serialto-parallel charge electrical input structure located at the top of the imaging array. A supplementary output amplifier located opposite the pixel resequencing block was also included on the focal-plane IC's. During normal operation, these test structures are not used, so that the number of required clock and monitoring signals is less than the IC pin count would indicate. Also, in practice, several CCD registers are clocked in tandem by externally connecting corresponding clock phases, such that the total number of control signals supplied to the IC is minimized.
Initially, functional testing at the wafer-probe level was performed by inputting various bit patterns and observing the resultant output sequence. An example of this for the pixel-delay focal-plane IC is shown in the multiple-exposure oscilloscope photograph of Fig. 10 . In this case, ten ONES transferred out of each of the three SP3 registers at 277 kpixels/s (3.6 ps/pixel) are delayed by nine, three, and six pixels, respectively. Charge-transfer efficiency (CTE) in the vertical registers as well as in the conventional horizontal (serial to parallel) registers was measured to exceed 0.99996/stage at the 277-kpixel rate, and CTE in the horizontal SP3 registers was measured to be 0.99994/stage at 83 kpixels/s and 0.9996/stage at 2 Mpixels /s. As expected, the single wire-transfer operation did not introduce any observable degradation in CTE.
Overall output amplifier sensitivity was measured to be 3.2 pV/electron. Intrinsic read noise levels could not be assessed due to test station noise limitations. Matching of the output amplifier pair was measured to be better than 0.05 %, with some chip-to-chip variation observed. (Mismatch can be corrected using an off-chip preamplifier prior to A/D conversion, if needed.)
B. Optical
Optical testing was performed at a 2-Mpixel /s output rate (26-28 frames/s). A 28-85"
Nikon lens was used to focus an image onto the focal-plane (and imager) IC's. Raw outputs from the IC's were first buffered by a preamplifier, which through gain and offset correction, provided a 0-1.5-V signal which was then inverted and sent to a raster-scan converter for display. To demonstrate functionality of the focal-plane IC's with optical input, a photograph taken from the screen of the scan converter is shown in Fig. 11 . The larger image is a portion of the complete 256 x 256 image captured (at a 26-Hz frame rate) by multiplexing the imager output through the upper SP3 register and bypassing the pixel resequencing circuitry. The inset image is composed of one of the eight difference-encoded elements (center pixel minus neigh- boring diagonal pixel) of each 3 x 3 block yielding an 80
x 80 subsampled "edge" image also generated at a 26-Hz frame rate.
V. SUMMARY AND CONCLUSION
In summary, four IC's implementing a variety of image reformatting techniques have been successfully demonstrated. The IC's provide real-time image reorganization to enable pyramidal, differential output of image data, thus simplifying downstream electronics and reducing system size, power, and weight of lossless hierarchical compression hardware. Two image reorganization processors were designed for hybridization to a separate imager IC, and two are realized on the focal plane. The hybrid IC's are compact and simple, requiring fewer clock lines than the focal-plane IC's. Although designed for hybridization with an imager array in a 68-pin package, they can also be utilized as "black boxes" to provide image reorganization in existing image processing systems. The two focal-plane IC's represent the first integration of a 256 X 256 CCD image sensor with additional charge-domain circuitry to enable image reformatting at video rates (28 frames/s). The image reformatting circuitry occupies 2 % of the active chip area and inconsequentially increases IC power dissipation. Signal integrity is not compromised by the structure since charge-transfer efficiency is high and the number of transfers is not increased. A summary contrasting characteristics and performance of the four IC's is given in Table 11 . During the past 16 years he has been involved in developing a wide variety of CCD imagers and signal processors, including designing and successfully demonstrating a 4096 X 4096 imaging array, the highest resolution imager in the world produced to date. He has authored more than 20 publications, holds two patents, and is presently Director of the Advanced CCD Technology Department at Loral Fairchild Imaging Sensors in Newport Beach, CA.
Dr. Bredthauer is a member of the American Physical Society. 
