The design of a CCD image half-toner integrated monolithically on the focal plane with a 256 X 256 frame transfer imager is reported and the algorithm used is discussed. The imager/half-toner chip is projected to achieve a throughput of 30 frames per second.
INTRODUCTION
Image half-toning is the process of transforming a continuous-tone image into a bi-level image such that the average local brightness (darkness) is unchanged. One faces such a task when trying to display gray scale images on devices which are binary in nature. An image half-toner can be defined by its input and output. The input to an image half-toner is a two-dimensional analog array, in which each element may take any value between a minimum, such as 0, to a maximum, such as 1. The output is a two-dimensional binary array, in which each element is either 0 or 1. The image half-toner can be viewed as a binary encoder of an analog signal.
The gray scale levels at the input are simulated by the spatial distributions of bright and dark elements at the output. Therefore, competition between good gray scale reproduction and high spatial resolution is inherent in the image half-toning process. A good image half-toning algorithm must have good frequency response at both ends of the spectrum: good low frequency response for good gray scale reproduction and good high frequency response for high spatial resolution. It also should avoid the introduction of processing artifacts associated with image half-toning such as false contours, false details, and visible micropatterns. Finally, an image half-toning algorithm should be of reasonable complexity to be able to handle large images at moderate speeds.
A trivial algorithm is to compare each analog input to a fixed threshold to determine the binary value at the output. In addition to the introduction of false contours, this algorithm has a low quality of both gray scale reproduction and fine detail preservation of the original image. Another simple algorithm is to represent each picture element (pixel) at the input by a block of pixels at the output. Each gray level is simulated by a different set of binary assignments to the pixels in the block. Although such an algorithm may achieve good gray scale reproduction, it is rejected because of the serious degradation of the input image resolution. Thus, in the above definition of the half-toning process, the size of the two-dimensional array at the output (number of pixels) is assumed to be equal to that at the input.
Despite the numerous a1gorihp of image half-toning which have been introduced over he years only a few of them have been realized in hardware1 . In this paper, an algorithm of image half-toniig based on those introduced by Floyd and Steinberg5'7 and Schroeder' is discussed. The design of a charge-coupled device (CCD)-based circuit to implement that algorithm is then presented. This circuit is integrated on the focal-plane with a CCD imager. A discussion of this design approach concludes the paper.
. ALGORITHM
The algorithm used in this work is a combinatioin f the error diffusion algorithm introduced by Floyd and SteinbergZ7 and the deterministic interpolation strategy described by Schroeder1. The aim is to produce good quality output images utilizing a moderately complex process. The algorithm can be explained by the block diagram shown in Fig. 1 . The assignment of a binary value to the output pixel introduces a deviation from the analog value of the input pixel. This deviation or error is calculated as the difference between the analog value of the input pixel and the binary value of the output pixel. The error is diffused into a local neighborhood of pixels which have not yet been processed. The original analog value of an input pixel and the error diffused to that pixel constitute the modified value of the pixel. The modified value of a pixel is compared to a threshold value to determine the binary value of the corresponding output pixel. 
B
A diffusion matrix determines the shape and the size of the local neighborhood into which the error is diffused. It also determines how much of the error i dded to each pixel in the local neighborhood. Floyd and Steinberg'' employed a local neighborhood of four pixels. The image is scanned in a conventional way from left to right and from top to bottom. The error introduced by processing the pixel P is calculated. Seven sixteenths of the error is added to the pixel to the right of P while five sixteenths of the error is added to the pixel below P. Three sixteenths of the error is added to the pixel below P to the left and the rest of the error (one sixteenth) is added to the pixel below P to the right. This diffusion matrix is shown in Error diffusion is an adaptive algorithm. The decision of assigning a binary value to an output pixel is affected by the previous decisions. An inaccurate assignment to a certain output pixel produces an error which will propagate into the input neighboring pixels and the algorithm tends to correct itself when processing these pixels. Such an algorithm reduces the accuracy requirement of the circuits that implement it.
Using the diffusion matrix of Fig. 2 to implement the algorithm yields the block diagram shown in Fig.3 . The diagram shows a comparison between two local neighborhoods. The input to the first node of the comparator is the sum of the current input pixel and the weighted average of the four local input pixels. The input to the second node of the comparator is the sum of the threshold value and the corresponding weighted average of the four local output pixels. The result of the comparison is the binary value of the current output pixel. The adaptive nature of the algorithm is achieved through the feedback loop. W: weighting coefficients.
CCD IMPLEMENTATION
The realization of the above algorithm requires both analog and digital memory to temporally store the input and output pixels of the local neighborhood. CCD serial shift registers which operate in the charge domain can be utilized as both analog and digital short term memory. Moreover, summation and weighting can be easily performed in the charge domain using CCD circuitry. Finally, integrating an image half-toner monolithically on the focal-plane dictates the choice of a technology that is compatible with the imaging technology. For the above reasons, the CCD technology was chosen to realize the image half-toner and integrate it with a CCD imager. Figure 4 shows the circuit block diagram that realizes the image half-toner integrated with an imager. The imager is a 256 X 256 pixel full fill-factor frame transfer 3-phase buried-channel imager. The pixel size is 12 X 12 um. The estimated charge handling capability is 500,000 electrons per pixel. The image frame is read out in the convential way; in parallel from the image zone to the memory zone to the output serial register and in serial from the output register to the output amplifier. The output amplifier is a surface-channel NMOS source follower designed to have 6 uV per electron sensitivity at 25 MHz clock frequency. The output amplifier transforms the signal from the charge domain to the voltage domain, interfacing the imager to the half-toner. The image half-toner has two main sections, the pipe organ and the comparator. The pipe organ has two sets of five buried-channe1 delay lines, the first set is controlled by the voltage analog signal fed forward from the source follower at the output of the ilnager while the second set is controlled by the voltage digital signal fed backward from the output of the half-toner. Each delay line in the first set corresponds to a pixel in the input local neighborhood including the current pixel while each delay line in the second set corresponds to a pixel in the output local neighborhood including the threshold value. At the input of each delay line there is a surface-channel fill-and-spill structure that transforms the signal from the voltage domain to the charge domain. The length and the channel width of each delay line correspond to the the position and the weighting coefficient, respectively, of the pixel that the delay line represents. Each set of delay lines is followed by a surface-channel source follower and the summation of the five signals coming out from each set of delay lines is performed in the charge domain on the input gate of the corresponding source follower. The two voltage signals at the output of these two source followers are fed forward to the two input nodes of the comparator. The comparison is performed in the voltage domain using a flip-flop-based surface-channel NMOS comparator. The output of the comparator is the binary value of the current output pixel. 
DISCUSSION
The floor plan of the iiuager/half-toner chip is shown in Fig. 5 . Most of the chip area is occupied by the imager (approximately 4 X 7 n2)
The area occupied by the half-toner is approximately 4 X 1 mn2
A serial shift register is added above the iruage zone of the imager to facilitate injecting charge electrically to the imager for testing purposes. The output of the imager can be monitored or directly coupled to the half-toner. The half-toner is configured such that the two output nodes of the pipe organ can be irtonitered and two voltage signals can be applied to the two input nodes of the comparator. This facilitates the tasks of testing the pipe organ and the comparator individually. Moreover, this configuration allows the replacement of the comparator by one off-chip if the performance of the former is found to be unsatisfactory.
X 256
Frame Transfer imager The imager/half-toner chip is being fabricated in a commercial CCD foundry using a triple-polysilicon double-metal process, one metal level being devoted to light shielding the memory/processor sections. The process is capable of selectively fabricating surface-channel and buried-channel devices. The tradeoff between noise performance and linearity was the reason behind deciding which devices should be surface-or buried-channel. The imnager and the delay lines are buried-channel for better noise performance while the source followers, the fill-and--spill structures, and the comparator are surface-channel for better linearity.
The signal in the iluager part of the chip is in the charge domain. As the signal travels from the imager to the pipe organ section of the half-toner, it is transformed to the voltage domain via the source follower and back to the charge domain via the fill-and-spill structures achieving a gain factor of approximately four. The larger size of the charge packet in the processor part of the chip is expected to achieve better accuracy.
The value of the threshold can be programmed via a voltage analog signal. The processor is projected to operate at 25 MHz clock frequency achieving a throughput of 30 frames per second. 5 . ACKNOWLEDGMENTS
