An Image Feature Associative Processor (IFAP), which extracts local and grobal features of input image data based on bio-inspired parallel architecture, is proposed. It consists of an image sensor, a cellular automaton and pattern matching processors based on PWM analogdigital merged circuits. IFAP extracts image features at a standard video frame rate with low power dissipation.
INTRODUCTION
Next generation computing systems will exhibit their capability in the application field of flexible recognition of complex objects and human faces. For implementing many kinds of image recognition systems, algorithms and architectures have been studied on conventional digital computers using a software approach. However, in order to realize real time recognition, bio-inspired architectures that feature massive parallel
The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI:
processing with a huge number of processing units and inputs are necessary. Implementations with binary digital systems and software are not suitable, because they operate sequentially and consume large amount of power and large chip area. Analog circuits however, are promising for realizing low-power and low-cost massively parallel systems.
In order to realize these processing circuits, neuron MOS devices and circuit architecture were proposed [1,2J. They realize arbitrary multiinput Boolean functions and various parallel analog processing circuits. Cellular automata using Neuron MOS devices were also proposed for 2D image processing[3}. Merged analog-digital circuit architecture using pulse width modulation (PWM) signals which is suitable to low-voltage, deep sub-J.Lm CMOS devices, was also proposed. [4,5} An image recognition system requires an on-chip image sensor. CCD imagers are dominant components in current video applications. However, They do not coordinate with deep-submicron CMOS technologies because special process technologies and high voltage docks are required. As was reported in re cent papers, CMOS imagers are appropriate for integrating various pixel level processings along with image capturing [6,7J. This paper proposes the architecture for an image feature associative processor utilizing PWM merged analog-digital circuits.
PWM Signal Inputs
: 
SYSTEM ARCHITECTURE

PWM SIGNAL PROCESSING
We have proposed using pulse width modulation signal (PWM) for merging analog and digital processing [4, 5] . A PWM signal expresses an analog value on pulse widths with binary voltage or current ampli-tude. This time domain expression provides immunity to dynamic range reduction caused by low voltage operation. Figure 1 shows a switched current integration technique for the PWM arithmetic. Switched current sources (SCSs) convert voltage PWM pulses into current pulses. Current integration on Cint results in asynchronous and parallel additions of those pulse widths. Reduced transition activities give the advantage of low power dissipation. Charge packet counting converts a sm all reference charge amount Qref = Cint Vref to a pulse and removes it from Qint successively during integration. [8] 
ALGORITHM FOR FEATURE ASSOCIATION
The processing functions are image sensing, image pre-processing, feature extracting, and pattern matching. The input image is sensed at pixel array and read out with gray scale or binary data. To effectively extract various features from sensed natural images, pre-processing is necessary. Pre-processing includes several spatial filters for noise reduction, edge enhancement, and averaging, thresholding and thinning. To realize flexible and high-speed image recognition, two features, global features and local features, are extracted. The global features include (1) block averaging and (2) X-and Y-projections of thresholded image. They compress redundant local information and select the search area and reference vectors to be matched for efficient local feature association. The local features are associated by pattern matching of sub-block of the input data to the reference vector data. The most similar reference vector code is obtained by calculating Manhattan distance and minimum distance search that requires very high computing power. These two features are united at higher level processing for recognition of complex objects and faces.
SYSTEM ARCHITECTURE AND CIRCUITS
IFAP is composed of a CMOS imager, a cellular automaton (CA), and a pattern matching processor (PMP), as shown in Fig. 2 . The inputs are optical image data focused on the sensor plane. The outputs are the associated codes of local features and global features. Three functional blocks communicate with each other through PWM signals through the global PWM bus, which has parallel 56-bits and a programmable pulse width in the range of lOOns-1J.ts. An 8bit binary data bus is used for output of the reference code and the distance value, and input of the reference data. The imager has four functions: (1) to read out each pixel value, (2) to threshold each pixel value, (3) to project the X-and Y-directions of threshold image, and (4) to average pixel values of the block. Figure  3 shows a block diagram of the imager. Each pixel executes nondestructive conversion of the input light intensity to PWM signals using a simple voltage comparator. [9] The pixel has two operation modes of gray scale conversion and thresholding. Pixels asserted by the column and row address shift registers become active for readout. An address signal generator supplies linear upward ramp voltages to the selected column pixels in the gray seale conversion, or a reference voltage in the thresholding. Row pixels share a readout bus that is driven by the current pulses in parallel in the block access mode.
The signal processing circuit consists of an array of a row counter and an SCS connected to eaeh readout bus, and a charge packet counter (CPC). In the projection calculation, pixels work in thresholding, and generate voltage pulses. X-projection values are obtained by counting by the pulses. Y-projection values are obtained by the switched current integration technique. The voltage pulses from the pixel are converted to current pulses by the SCS, and the currents are integrated on the capacitor, Cint. Converting the integrated charge to digital data by CPC, Y-projection values are obtained.
CELLULAR AUTOMATON
A CA cell is connected with its 8-nearest neighbors through PWM current pulses. The templates provide coefficients of connection, and control CA function. The function includes acting as spatial filters to reduce noise, enhance edges and thin. A schematic of the CA cell circuit and templates are shown in Fig. 4 . The CA has two modes of operation: a multi-bit mode and a binary mode. In the multi-bit mode, some spatial filters for noise reduction and image enhancement are provided. In the binary mode, templates for dilation, erosion, edge detection and thresholding are provided. By combining these templates, effective pre-processing of sensed image data and binary image thinning can be realized. Data input/output to the eell array are exeeuted through the eolumn parallel loeal PWM bus which is eonnected to the global PWM bus.
The cell consists of Switched Current Sources (SCSs), two capacitors (Cl and C2), a latch comparator and an inverter chopper comparator. The state of each cell is represented by the charge on the eapacitor C2 . The charge is converted to a PWM signal by the inverter chopper comparator, where the latch comparator detects the polarity of cell state in the multi-bit mode. The PWM output signal is transmitted to 8 neighbor cells and the self-input. The templates are determined by switching the SCSs. Multiplication of the template coefficient and addition are carried out by PWM switched current integration on the capacitor Cl. Output PWM generation and the multiplicationj addition operations are carried out in parallel pipelined timing. Each template is carried out within only one cycle time because of the fully parallel operation of the cellular automaton. The cycle time is a summation of the maximum pulse width of the PWM signal and areset timing. 
3.3
PATTERN MATCHING PROCESSOR
A block diagram of the Pattern Matching Processor (PMP) is shown in Fig. 5 . The PMP is composed of a PWM-to-Digital Converter (DWC) array, picture RAM (p-RAM), reference RAM (r-RAM) , a Digital-to-PWM Converter (WDC) array, processing element (PE) array, a chargeto-pulse converter (CPC) array, a winner-take-all (WTA) array and an address control. The input PWM image data is converted to binary digital data by WDC composed of binary counters, and stored in static CMOS RAM. The stored input data are converted to PWM by the DWC, and supplied to the PE array. The reference data are also converted to PWM by the DWC, and supplied to the PE array. Manhattan distance is calculated by the PE array using PWM arithmetic, and the CPC array, as shown in Fig. 5 . The PE is composed of an EXOR gate, which calculates the absolute difference of PWM signals: lXi -Rfl. All difference data are summed up in parallel by current integration of PWM pulses, as shown in Figure 1 . The PWM pulse from ny PE's are added simultaneously, and the additions are sequentially carried out nx times. Thus, Manhattan distance for nx x ny block matching is calculated. Each distance value that is represented by the integrated charge is converted to binary digital data by the CPC, where nx and ny are programmable in the range from 1 to 8. The minimum distance is searched by the WTA array, which is composed of binary digital techniques based on word parallel, serial bit-by-bit comparison. A new distance value transferred from thr CPC to Reg.1 is compared with the last minimum distance value that is stored in Reg2. If the new value is smaller than that of the last value, the new value is stored in Regl.
CHIP FABRICATION AND EXPERIMENTAL RESULTS
An experimental IFAP chip was designed and fabricated with O.8p,m p-well CMOS technology with double-poly and double-metal layers. A block layout of the IFAP chip is shown in Fig. 6 . The chip size is 15mmx15mm. The image sensor array with 56x56 pixels, the address signal generator, and the signal processing circuit are integrated in a 6mmx6mm chip area. Each pixel circuit has a PN junction photo detector, an analog storage capacitor and PWM processing circuits in a 100/Lm X 100/Lm area. The cellular automaton with 40 X 50 cells, the timing and the template control circuits are integrated in a 7.5mmx 7.5mm area. The cell includes about 150 MOS devices and 2 capacitors in a 159/Lmx 122/Lm area. PMP is integrated in an about a 50mm 2 chip area, the PE array size is 8x28 and the CPC array size is 28. The data storage for the input of PMP consumes a relatively large chip area of 8.1mmx3.2mm, because, in addition to p-RAM, the WDC and DWC arrays are necessary. In the future, analog memories will be available to store PWM data, and reduce a chip area one-half of the conventional digital approach. 
CMOS FUNCTIONAL IMAGER
PATTERN MATCHING PROCESSOR
Parallel distance calculation for the local feature association is carried out. Input and output pulses measured by HP16500B logic analyzer are shown in Fig. 8 . The input image data is apart of a Chinese character with an 8x 16 pixel block. The reference data is one of the cross point pattern represented by an 8x 8 pixel block. The smallest distance was obtained at the 6-th output of CPC array. PMP consumes 120m W at a 3.3V power supply. Processing speed per unit power dissipation was 6.75 I SoeonWd .. 1)00.000 n. 1= Do1o. GOPS/W. The power dissipation of PMP is one-fourth of the simulated value of binary digital circuits with the same CMOS technology.
APPLICATIONS TO HANDWRITTEN CHARACTER RECOGNITION
As a typical application of IFAP, feature association for the recognition of handwritten Chinese characters was demonstrated by simulation. An input character, thinned character, X-and Y-projections, search areas, reference vectors, associated local features and a feature map are shown in Fig. 9 . Using X-and Y-projections of the binary image, searching windows and candidates of reference vectors were ef-fectively restricted for local feature association. The local features were matched with the reference vectors composed of 8-direction terminations, 12 branches, 8 corners and 2 crosses with 3x3 pixels. The associated local features were represented by the feature map which was compacted maintaining the relative position of each local feature. Other feature maps are also shown in Fig. 1.9 . These results show that deformation and size-difference of handwritten characters are compensated by this algorithm.
The features associated by IFAP are transferred to a higher level processor, and are linked with higher level symbolic information. The flexible feature association realized with IFAP will become key for intellectual recognition systems. 
CONCLUSIONS
An Image Feature Associative Processor (IFAP) with an on-chip imager and parallel processors which utilize merged A-D circuit architecture based on pulse width modulation (PWM) method was proposed. An experimental IFAP chip was designed and fabricated in a 15mmx 15mm chip with O.8j.Lm CMOS technology. The PWM A-D merged architecture drastically reduces power dissipation and chip area.
7.
