Recently, a real-time clustering microchip based on the ARTl [l] algorithm has been reported [2]. This chip was able to classify 100-bit input patterns into up to 18 categories. However, its high area comsumption (lcm2) caused a very poor yield (6%).
I. Introduction
Recently, a real-time clustering microchip neural engine based on the ARTl architecture has been reported [2] . It is based on a slightly modified version of the ARTl algorithm which was shown to preserve all its original computational properties [3] , but has a more VLSI-friendly algorithmic structure. The reported ARTl chip was able to cluster binary input patterns of up to 100 pixels into up to 18 different categories. The chip was able to classify an input pattern and learn its relevant characteristics by updating its internal knowledge, all in less than 1 . 8~~. The chip internal circuit architecture also allowed modular expansion of the clustering system. Assembling an N x M array of these chips would result in ARTl systems able to cluster N x 100 pixel input patterns into up to M x 18 categories.
Unfortunately, the resulting area consumption (and cost) of the chip was extremely high ( lcm2), and consequently its yield performance was extremely low (6%). Nevertheless, due to the fault-tolerant nature of the algorithm, most of the faulty chips still were able to perform satisfactorily [2] .
In this paper, a new ARTl chip is presented which solves the yield problem by reducing chip area. After careful MOS transistor electrical parameter mismatch characterization of the technological process to be used, it was possible to identify the maximum chip area for which the parameter variations would remain within the necessary limits to preserve the required system operation precision. It was found that for the ES2-1.Opm CMOS process, for transistors of size lOpm x lOpm, spread over an area of the order of 2.5mm x 2.5mm, and for current levels around lOpA , the transistor current standard deviation is around (3 ( I ) = 1 % . Taking this into account, we designed and fabricated an ARTl chip capable of clustering 50-bit input patterns into up to 10 categories, with a yield performance of 98%, and whose area is 15 times less than that of the first prototype. The chip showed a very robust behavior that allowed us to implement some multi-chip ARTl systems.
VLSI-Friendly Algorithm and Its
Hardware Implementation
An ARTl system is a neural associative memory capable of generating in an unsupervised way stable recognition codes in response to arbitrary many and complex binary input patterns. An ARTl architecture 1. All the binary weights z . are set to '1'.
2. An input pattern I is applied to the system. 4. The category J whose TJ is maximum is selected.
The corresponding output y J is set to '1' while all others are set to y j ~ = 0 .
5.
Winner-Take-All :
The vigilance criterion is checked for the winning category.
If plIl < 11 n zJI the criterion is not satisfied, TJ is forced to '0' and a new winner is selected.
If plIl 2 1 1 n zJI the weights zJ are updated according to the law z J ( n e w ) = I n z . ( o l d ) . The array of input cells generates a current L A c Z i = LAIII , which enters a tunable gain current mirror of gain p . This mirror distributes a current p LA 1 1 1 to the input of ten current comparators CC. .
J

Each synapse outputs two currents:
A current LAIizij which flows to a common node for all the synapses in the j -th row, resulting in a total current L A c I . z . . = L A I I n z . l that enters in the j -th current comparator CC.
A current LAZ:z::-L,z,;, that results in a total
enters the j -th branch of the WTA.
Each current comparator receives a total current LAII n zII -pLplIl and compares it versus '0'. If this current is negative the vigilance criterion is not satisfied and signal c is activated preventing current T from competing in the WTA.
Once a winning node ( y , = 1) is stable, signal ''LEARN" is activated and weights zJ are updated changing its stored value to ZiziJ.
Yield and Area Optimization
To obtain good system precision it is important to make all LA and L, synapse current sources to match within a certain limit. In our first prototype, a tree-like current-mirror structure was implemented to generate all LA and L, currents from two external current references. The external current references enter to a multiple-output current mirror which delivers several output currents which enter as inputs to another stage of multiple-output current-mirrors. Each multiple-output current mirror has at the most ten outputs and is laid out using common centroid techniques to reduce the gradient-induced mismatch. After a few stages several thousands of LA and L, currents are available which match with a precision better than 1 % for currents levels higher than 5 p A . However, this structure is very area consuming, which results in a very poor yield. That prototype had a die area of lcm2 while having a 100-node F , layer and an 18-node F , layer and exhibits a yield performance of 6%.
Our results of mismatch characterization showed that is was possible to eliminate the tree-like current mirror structure while maintaining a current precision better than 1%. A new ARTl prototype was designed with an area 15 times less than that of the first prototype and a 98% yield performance. This prototype chip occupies an area of 2.5" x 2.2" having a 50-node F , layer and a 10-node F , layer.
For the mismatch characterization, a special purpose chip in the ES2-1 .Opm technology was designed [4] . The chip contains a matrix of cells, each of them containing different sized PMOS and NMOS transistors, plus decoding circuitry. A simplified diagram of the chip and the experimental set-up to measure the transistors is depicted in Fig. 3 . All NMOS transistors in the chip have their sources connected together to pin S. All NMOS transistors share their drains at pin DN and all PMOS transistors have their drains connected to pin DP. Every transistor in the chip has its gate short-circuited to its source except for one pair of NMOS and PMOS transistors. The selected pair transistors have their gate connected to pin G . A host computer controls the selection decoder and a curve tracer (HP4145). If pin DP is left unconnected and the curve tracer is connected to pins S, DN and G each NMOS transistor can be separately characterized. In a similar way, if pin DN is left unconnected, each PMOS transistor can be measured by connecting the curve tracer to pins S, DP and G.
NMOS and PMOS transistors of size 10pm x lOpm spread over an area of 2.5mm x 2.5" were forced to the same V,, and VDs voltages so that their nominal current was around lOpA . The effective measured currents flowing through the transistors are depicted in Fig. 4 . Fig. 4(a) shows the currents flowing through the NMOS transistors as a function of the transistor position in the array. 
A I$
Eight chips could be fully characterized. Each chip contains several arrays of NMOS and PMOS transistors of different sizes spread over an area of 2.5mm x 2.5mm. Table I shows the results for NMOS transitors of size lOpmx lOpm driving a nominal current of 10pA. The table shows the noise error component CJ ( Ala), the gradient error component AI!, the ratio r , and the total error component oT(Zo) (gradient+noise) . Table I1 contains the same information but for PMOS transistors. As can be seen, for this chip dimensions, this current level and transistor geometries the noise error contribution is of the same order or higher than the gradient error contribution, and the total current error oT(Io) is always less than 1%. Consequently, for these conditions it is possible to avoid the use of high area consuming tree-like mirror structures and directly implement a simple current mirror with all the needed outputs. This is the approach used in the present ART1 
IV. Experimental Results
All ten fabricated chip samples were fully operational and for none of them we were able to detect any fault in its subcircuits. All system components could be isolated and independently characterized. The circuit performances of the different subcircuits were similar to those of the first prototype [2] .
Although the chip is analog in nature, its inputs and outputs are digital. Therefore, it is possible to test the system level behavior using a digital test equipment. We used the test equipment HP82000 to fully test the system level operation. The systelh proved to be very robust and therefore a multichip system was easy to assemble. The operation of two multichip systems was also tested: a two-chip ARTl system and a three chip system forming an ARTMAP architecture. The two-chip ARTl system consists of two horizontally assembled ARTl chips. The resulting system is able to cope with 100-bit input patterns.
The ARTMAP architecture consists of two ARTl subsystems connected through an Inter-ART module as depicted in Fig. 6 , where a is an N,-dimensional input vector to the first subsystem ARTla, and b an Nb-dimensional input vector for the second ARTl subsystem. An ARTMAP system is a supervised learning neural network that learns the correspondence between two simultaneous input patterns a and b. The Inter-ARTmodule is simply an M a x M b array of binary weights which learns the correspondence between the ARTla category which classifies pattern a and the ARTlb category which classifies pattern b. An ARTMAP hardware system was assembled using two ARTl chips and an extra chip for the Inter-ARTMAP module. The system level operation of the ARTMAP hardware system has also been tested using the HP82000 digital test equipment.
