The article presents measurement results of prototype integrated circuits for acquisition and processing of images in real time. In order to verify a new concept of circuit solutions of analogue image processors, experimental integrated circuits were fabricated. The integrated circuits, designed in a standard 0.35 µm CMOS technology, contain the image sensor and analogue processors that perform low-level convolution-based image processing algorithms. The prototype with a resolution of 32 × 32 pixels allows the acquisition and processing of images at high speed, up to 2000 frames/s. Operation of the prototypes was verified in practice using the developed software and a measurement system based on a FPGA platform.
Introduction
In many monitoring, navigation and control systems, image processing techniques are frequently used for object control. An important component of such systems is an image sensor responsible for image acquisition and pre-processing. Depending on the applications, the image sensors should have low-power consumption or high-speed of processing. For example, in robotics or biomedical implants, low-power consumption is of primary importance, because such systems are typically powered from small power sources, such as batteries or small solar cells. On the other hand, for example in road traffic control or safety monitoring systems, high-speed image processing is an important requirement. The characteristic feature of all the image sensors is the extremely large amount of data which needs to be processed in a relatively short time interval. In order to improve the overall efficiency of the image processing, a two-stage processing is typically implemented in microelectronic vision systems. The first stage, consisting of image acquisition and application of numerically-intensive early-vision enhancement algorithms, is usually performed in so-called vision-chips. Further processing steps are typically implemented in the main processors located in the central unit of the system. The quality and efficiency of the image processing systems mainly depend on the quality of the vision-chips. A complete vision-chip contains a photo-detector array and embedded low-level programmable vision processors.
The very early solutions of the integrated vision-chips were mostly dedicated to a specific image algorithm and could not be reconfigured. The next generations of the programmable vision-chips, designed in multiple instruction multiple data (MIMD) or single instruction multiple data (SIMD) architectures, were able to perform several image algorithms [1] [2] [3] [4] [5] . The newest chips have fully programmed architectures with parallel analogue data processing of significantly reduced processing time [6] [7] [8] [9] . Although the development in vision-chip implementations is impressive, supply power consumption and image processing speed still need to be improved.
As mentioned before, the vision-chips consist of two main components, the vision sensors and the low-level processors. The vision sensors are constructed using p-n diodes or transistors exposed to light, which convert light to an electrical signal. A relatively small photo-current in the pA range, generated by the photo-sensors, can be converted to voltage with an amplitude of about 1 V in order to allow processing in the voltage-mode analogue processors [10] [11] [12] [13] [14] [15] [16] [17] . Another approach is to directly amplify the photo-current to the µA level, in order to be processed in the current-mode analogue processors [5] [6] [18] [19] . In such circuit solutions, the main part of supply power is consumed by the analogue processors. Therefore, if the primary requirement is the reduction of supply power consumption, the analogue processors have to be designed using specially optimised architectures.
The presented CMOS implementation of the vision-chip addresses some of the outlined weaknesses of the known solutions. The considered solution, based on parallel signal processing, is dedicated to high-speed applications. The key advantage of this chip is the high speed of image processing, reaching 2000 frames per second. The circuit, depending on the requirements, can also be reconfigured to have relatively low power consumption in the range of several mW. The analogue processor in this vision-chip can perform convolution filtering of images based on a full 3 × 3 kernel, where the kernel coefficients can be reprogrammed on the fly, even during the image processing time.
The paper is organized as follows. An overview of the basic parameters and methods for their measurement is given is Section 2. The following section presents details of implementation of a proof-of-concept prototype integrated circuit. The results of the circuit measurements are presented and discussed in Section 4. The final conclusions are given in the last section.
Basic parameters of vision-chips and their measurements
Two kinds of parameters are used to characterize the vision-chips. The first group of parameters describes properties of the image sensors, whereas the other group characterizes the embedded image processors. The sensors are mainly characterized by electro-optical parameters, whereas the processors are characterized by electrical parameters. An overview of the key parameters belonging to each group is presented in the following subsections.
Parameters of image sensors
The basic parameters of image sensors are: resolution, frame rate, fixed pattern noise (FPN), photo-response non-uniformity (PRNU), random noise, noise-to-signal ratio (SNR), dynamic range (DR), linearity and sensitivity [20, 21] .
FPN, also known as dark signal non-uniformity or offset distribution, is a measure of pixel-to-pixel variation when the image sensor array is in the dark. It is primarily due to dark current differences of the photo-sensitive devices, reset noise and mismatches of the read-out circuitry. It is signal-and time-independent noise, which is additive to other types of noise in the sensor. FPN is measured as a standard deviation (or as a peak-to-peak variation) along a line of an image, averaged to remove random noise. FPN is usually equal to a few mV rms, and is frequently given in percent as the ratio of the rms voltage noise to the full-scale output voltage swing.
PRNU, also referred to as gain distribution, is a pixel-to-pixel variation in the response of a photo-sensor array to fixed-intensity light
where V out,max is the full-scale output voltage swing of a pixel, and σ(V out ) is a standard deviation of the pixel signals under uniform lighting. PRNU is typically measured at 50% of the saturation level. The measurements of FPN and PRNU are ideally performed with an integrating sphere that allows generation of very uniform illumination, with typical variation less than 0.5% over the complete field of view. Such an integrating sphere can be combined with various light sources, such as LEDs with selected wavelengths or even with a monochromator, if continuous tuning of the wavelength is desired.
The sensor's random noise is independent across pixels and varies from frame to frame. This noise is typically measured in the darkness and with short shutter times in order to avoid the effect of photon shot noise and leakage currents. The random noise is computed on the basis of many images (frames). In this case, the standard deviation (i.e. rms value) of the output voltage for one pixel yields the sensor noise floor, usually expressed in mV. With respect to the sensor noise, the term read-out noise is often used. It is random noise, occurring at the output of a complete vision-chip, which includes the sensors noise as well as the noise generated in the readout circuitry, comprising analogue amplifiers and AD converters. The noise floor of an image sensor is typically given in the absence of illumination. However, the rms value of the random noise depends on the illumination level, and in most cases this noise increases with higher illumination. SNR, expressed in decibels, is the ratio of the sensor's output voltage to the rms value (or the standard deviation) of the random noise, at a given illumination level and shutter time.
Another related parameter, DR, determines the range of illumination that can be detected by the sensor. DR can be simply computed by the sensor's electrical parameters, as the ratio of the full-scale output voltage swing to the rms value of the dark random noise.
The linearity of a sensor can be determined on a transfer characteristic, measured for different light intensities beginning from the dark and ending at the saturation level, and then fitting a 'best fit' straight line from 25% to 75% of the saturation. The maximum peak-to-peak deviation of the output voltage from the 'best fit' straight line defines the absolute error (Epp), which is used in the following definition of the linearity
Parameters of analogue processors
Measurements of analogue processor performance are difficult in practice. The basic parameters for digital processors, such as speed and accuracy can be readily determined by the number of executed instructions per second, and the number of bits used for signal representation. Much more difficult is the situation with the analogue processors, where only the equivalent parameters can be determined. One of the practical methods of measuring such parameters is based on the comparison of two results, achieved from ideal numerical processing and the processing accomplished with the analogue processor under test. In this case, the same test image is processed by two processors, and then the difference is calculated for all image pixels. The equivalent resolution is determined for the case where the maximal error is less than half of the output signal change caused by the change of the last significant bit (LSB) in the numerical processing. This kind of test can be performed only if a tested vision-chip allows processing of an external reference image; in other cases, only the total processing error for both the image sensor and processor can be determined.
The speed of digital processors, expressed as the number of frames per second, strongly depends on the complexity of the implemented algorithms. For the analogue processors, the complexity of algorithms has less impact on the processing speed, and therefore in many cases they are much faster for complicated algorithms. Due to this fact, the speed of the analogue processors is typically determined as an average parameter for several selected algorithms.
High-speed image sensor

Vision chip architecture
The experimental image sensor is fabricated in a 0.35 m CMOS process, provided by Austria Micro Systems. The silicon structure of 2783 m2583 m size is embedded into a JLCC 68 package. Fig. 1a shows the prototype chip with removed top cover so that the lens can be installed. The architecture of the high-speed image sensor is selected to meet two contradictory requirements of high-speed processing and a low supply power consumption. Because the primary applications of the designed sensor are in monitoring and controlling systems, the sensor architecture and the circuit solution have been simplified. In such systems, the accuracy of image signal processing equivalent to a resolution of 5 to 6 bits is sufficient for satisfactory results [22] .
To achieve a good trade-off between speed and power consumption, the SIMD architecture using analogue processors containing multipliers and summers is selected [23] . The general architecture of the prototype chip is shown in Fig. 1b [24] . The photo-pixel array contains 3232 pixels and is located in the center of the chip. Two sets of analogue processing elements (APE) are arranged in columns placed on the right and the left sides of the pixel array, and work in parallel. This kind of circuit arrangement enables relatively simple and short signal paths, having small stray capacitances which help to achieve fast signal transmission and reduction of the power dissipated on switching. By making the photo-pixel sensor as simple as possible, containing only the necessary circuits for signal acquisition, it is possible to significantly reduce the array dimensions and length of signal paths. The accompanying logic circuits control the sequence of processed image samples. The architecture presented in Fig. 1b guarantees a one clock cycle calculation of a complete result for each image pixel, regardless of the complexity of the realized image algorithm. Due to parallel processing of a complete set of 9 signals coming from all the neighboring photo-pixels (as shown in Fig. 2 ), a high data throughput is achieved. The time needed to complete the calculations of a single image frame is equal to the product of the number of columns and the clock period. The details of APE circuit realization are given in [23, 24] . 
Image sensor
All pixels in the photo-array process images in the same time slot by using an electronic shutter that eliminates image smearing caused by fast movement of the object. The shutter time can be varied in order to adjust the photosensitivity to different illumination conditions. The important feature of the designed CMOS sensor is independent control of the shutter time and the readout pixel clock, which helps to achieve relatively good images even in poor illumination. The schematic of an active pixel sensor (APS) is shown in Fig. 3a . Such an APS configuration ensures a linear response to illumination and good noise performance [20] . The pixel is composed of five functional circuits: the photodiode made as an n-well over a p-substrate, the reset transistor M1, the source-follower M2 biased by the current sink M3, the shutter switch M4, the storage capacitor C MEM , and the buffer for non-destructive readout driving a row interconnecting line. In order to reduce the power consumption, the drain current of M3 is switched to zero after closing the shutter switch. Furthermore, the signal "Enable" activates the output buffer only during the read time of the selected column. As a result, each functional circuit is only activated while needed for a short time interval. A single cycle of the pixel operation starts with resetting, which is initialized by pulling the signal "Rst" to low and charging the photodiode capacitance C D to V DD . At the same time, the signal "Shutter" is set to V DD and C MEM is charged to V REF = V DD -V GS1  2.1 V. The transition of the signal "Rst" from zero to V DD starts the integration of the photodiode current. The speed of discharging C D and C MEM is proportional to the energy of incident light, which assures a linear photo-electrical conversion. At the end of the integration time, the shutter switch is opened, and the final charge is stored in the capacitor C MEM . The readout of the voltage (charge) stored in C MEM is started by activation of the signal "Enable", which controls the pixel output buffer. The measured waveforms illustrating the operation of the circuit are shown in Fig. 4 . The waveform <1> shows the voltage on the memory capacitor C MEM , which represents image data. The logic signals <2>, <3>, and <4> respectively: drive the reset, the shutter, and control the process of array columns reading. To improve the clarity of Fig. 4 , the low level of the "Rst" signal is set to the relatively large width of 20 ms, which in reality is about 500 ns. When "Rst" reaches the high level, the integration starts, and voltage <1> decreases linearly until the shutter <3> drops to low. At this moment, the voltage on the capacitor C MEM represents actual image data and can be further processed. The sequential reading of the consecutive array columns can be observed based on signal <4>, which is activated and becomes the master clock synchronizing the process of reading. 
Prototype testing and measurements
For practical testing of the fabricated vision-chips, the measuring system presented in Fig. 5 was developed. The system consists of three main components: (i) the test board with ADCs MAX154, the socket for the vision-chip, and a supply circuitry, (ii) FPGA evaluation kit with Xilinx Virtex4-SX35 chip, (iii) a personal computer with a dedicated software to configure and control the operation of the test board. The lens is placed over the prototype vision-chip to create an image on the surface of the silicon structure. The video signals are transmitted outside the test chip over 4 analogue outputs, and then they are converted to digital form by 4-channel 12-bit ADC converters. All the necessary digital control signals are generated using the FPGA platform, and 4 variable pulse-width converters PWM0-PWM4. The test board system is controlled by the software Picoblaze microcontroller, which communicates with the personal computer (PC) via the 1Mbit/s USB/UART port. The data obtained from the test board is stored in a RAM or on a hard disk for further offline processing. The software, running on a PC, controls the configuration of the vision-chip (Fig. 6a) and enables visualization of the obtained data (Fig. 6b) . As Fig. 6c shows, the vision-chip configuration includes definition of all the convolution kernel coefficients. The developed software also allows FPN reduction of the photo-sensor matrix, which is based on the parameters specified in the window shown in Fig. 6d . FPN reduction uses the method presented in [25] , where invariability of FPN in time and a linear pixel characteristic are assumed. Under such assumptions the pixel calibration process can be expressed by the following simple equation [25, 26] off pix corr corr
where: V corr is the vision signal after correction, V pix , V off denote respectively the original signal and the dark offset of the pixel, G corr is the gain correction coefficient. In order to find the optimal correction parameters V off and G corr the expression (3) is evaluated for two different levels of sensor illumination. The achieved correction parameters are stored on a hard disc for later use, when the calibration is needed. 
Sensor performance
The image sensors always suffer from technology related nonidealities that limit the performances of the vision system. In the presented vision-chip there are two main sources of nonidealities: the variation of DC offset of APEs, and the mismatch between individual photo-pixel parameters. The DC offsets are mainly caused by the mismatch between currents biasing the individual APEs. For uniform illumination, those offsets manifest as the row-torow pattern shown in Fig. 7a . This kind of inter-row FPN can be easily removed in the data acquisition process. The result of the inter-row FPN removing is shown in Fig. 7b , where only the pixel-to-pixel FPN is visible. The pixel-to-pixel FPN can be reduced by applying correlated double sampling (CDS). In such a case, each output of a pixel is read twice, after reset and at the end of the integration time. The corrected signal is obtained as the difference of these two values. In the presented chip, CDS is realized externally in the FPGA platform. For the tested chips, the measured pixel-to-pixel FPN is about 2 mV rms. Fig. 7c shows the image after CDS correction, where FPN is reduced to about 0.2 mV rms. The accuracy of the convolution calculation is also affected by nonlinearity of the image sensor, which primarily results from nonlinear characteristics of the photo-diode and the pixel's circuitry. The light-to-voltage characteristic of a complete sensor is presented in Fig. 8 for an integration time equal to 100 s. The nonlinearity determined within the full output voltage swing is 2.6%. In Fig. 8 , the saturation voltage is 1400 mV and the measured random noise in the dark is 1.8 mV rms, which gives an optical dynamic range DR = 57 dB. Under 500 lux illumination, the random noise increases to 4 mV rms, thus the peak output signal-tonoise ratio is SNR = 50.5 dB. A single photo-pixel consumes 0.4 A of the supply current while recording an image and needs 10 A during the readout period. The shutter is opened for the entire array of the photopixels, which means that the complete array consumes 0.4 A x 32 x 32  410 A during a typical acquisition interval of 15 s. The image data transfer and processing require activation of three columns and all APEs at the same time. For typical working conditions of 25 frames/s, the time needed for processing of a complete row of images is 3.2 s, which makes an average power consumption of 7.2 W (780 A per APE). The complete chip, including the biasing circuits and digital logic, consumes 21 W on average. The summary of the main parameters of the vision-chip is given in Table 1 .
Examples of image processing
The presented prototype is a small-size, proof-of-concept device fabricated mainly for quality testing of early-vision algorithms implementation. The selected examples of image processing are presented in Fig. 9 . Fig. 9a shows a raw image for reference, whereas Fig. 9b illustrates the left-bottom corner detection, Fig. 9c the vertical edges detection, Fig. 9d all the edges detection, and Fig. 9e shows the result of the low-pass filtering. For the presented results, the achieved image processing accuracy is about 5-6 bits. The total processing error was evaluated for the images presented in Fig. 9b -e. For these images, the achieved processing errors (ignoring the border effects) are equal to 1.8%, 2.2 %, 2.9%, and 1.8%, respectively. The error was defined as RMS of the difference between the perfect processing, using 16-bit resolution, and the results achieved with the use of the prototype vision-chip. Even though the analogue computations are limited in accuracy, the final result seems to be satisfactory for many computer vision applications. All the images in Fig. 9 are obtained without CDS. Besides the mentioned errors, the images are also distorted by the effect of discharging the memory capacitor (C MEM in Fig. 3a) . This effect results from the influence of leakage currents caused by intensive light penetrating the analogue circuitry in APEs. At 125 lux illumination level, a 25 mV per millisecond voltage drop of the image signal is observed, which makes 1.7% of the maximum output voltage swing. The time needed for image acquisition can be made relatively small (below 20 s, typically 15 s), whereas the data transfer from a photo-pixel to APE is extremely short (shorter than 100 ns), which means that the speed of image processing is as high as 20000 fps. The speed of image recording for the presented test circuit is mainly limited by the external ADCs and the throughput of the connection between the test board and the PC. In spite of that, a fast moving object can be effectively observed. An exemplary result of such an observation, in the case of a black and white wheel, is presented in Fig. 10 . Fig. 10 a shows a raw image of the wheel rotating at 2000 rpm. The image, achieved in real time by application of an edge detection algorithm, is shown in Fig. 10 
Conclusions
A high-speed vision-chip that allows real-time focal-plane processing of grey-scale images is presented. The proof-of-concept prototype fabricated in 0.35 m CMOS technology has been functionally tested and measured. The obtained measurement results prove the correctness of the assumed chip architecture and circuit solutions. A good trade-off between power consumption, accuracy of image processing and the processing speed have been achieved. Due to a large surplus of image throughput, the presented image sensor can be extended to higher resolutions. Based on the presented design, it can be estimated that an image sensor of 128 × 128 resolution, fabricated in 0.18 m CMOS technology [27] would occupy a 6 mm 2 chip area while dissipating only 100 W of supply power.
