An ART1 microchip and its use in multi-ART1 systems by Serrano-Gotarredona, Teresa & Linares-Barranco, Bernabé
An ART1 Microchip and its use in Multi-ARTl
Systems
Teresa Serrano-Gotarredona and Bernab6 Linares-Barranco
National Microelectronics Center (CNM), Ed. CICA, Av. Reina Mercedes S/n,41012 Sevilla,
SPAIN. Phone: (34) 5-4239923, FAX: (34) 5-4239923, E-mail: bernabe@cnm.us.es
Abstract
Recently, a real-time clustering microchip based
on the ART1 [1] algorithm has been reported [2].
This chip was able to classify 100-bit input patterns
into up to 18 categories. However, its high area
consumption (lcm2) caused a very poor yield (670).
In this paper, an improved prototype is presented. In
this chip, a different approach has been used to
implement the most area consuming elements. The
new chip can cope with 50-bit input patterns and
classify them into up to 10 categories. Its area is 15
times less than that of the first prototype and it
exhibits a yield performance of 98%. Due to its higher
robustness, multichip systems are easily assembled.
I. Introduction
Recently, a real-time clustering microchip neural
engine based on the ART1 architecture has been reported
[2]. It is based on a slightly modified version of the
ART1 algorithm which was shown to preserve all its
original computational properties [3], but has a more
VLSI-friendly algorithmic structure. The reported ART1
chip was able to cluster binary input patterns of up to 100
pixels into up to 18 different categories. The chip was
able to classify an input pattern and learn its relevant
characteristics by updating its internal knowledge, all in
less than 1.8w. The chip internal circuit architecture
also allowed modular expansion of the clustering
system. Assembling an N x M array of these chips
would result in ART1 systems able to cluster N x 100
pixel input patterns into up to M x 18 categories.
Unfortunately, the resulting area consumption (and cost)
of the chip was extremely high ( lcm2), and consequently
its yield performance was extremely low (6%).
Nevertheless, due to the fault-tolerant nature of the
algorithm, most of the faulty chips still were able to
perform satisfactorily [2].
In this paper, a new ART1 chip is presented which
solves the yield problem by reducing chip area. After
careful MOS transistor electrical parameter mismatch
characterization of the technological process to be used,
it was possible to identify the maximum chip area for
which the parameter variations would remain within the
necessa~ limits to preserve the required system
operation precision. It was found that for the ES2- 1.Opm
CMOS process, for transistors of size 10pm x 10Lm,
spread over an area of the order of 2.5mm x 2.5mm, and
for current levels around 10pA, the transistor current
standard deviation is around c (1) = 1% . Taking this
into account, we designed and fabricated an ART1 chip
capable of clustering 50-bit input patterns into up to 10
categories, with a yield performance of 98%, and whose
area is 15 times less than that of the first prototype. The
chip showed a very robust behavior that allowed us to
implement some multi-chip ART1 systems.
II. VLSI-Friendly Algorithm and Its
Hardware Implementation
An ART1 system is a neural associative memory
capable of generating in an unsupervised way stable
recognition codes in response to arbitrary many and
complex binary input patterns. An ARTI architecture
consists of two layers. The F1 or input layer has N
nodes each of them receiving a component of the input
vector I= (11, ... ,/N). Each of the M nodes in the F2
or category layer represents a cluster of input patterns or
learned category. Both layers are fully connected by a
matrix of binary weights ZU. The weight vector that
connects to the jrh F, node Zj = (Zlj, . .. ,ZNj)
characterizes the learned ~9 category j. Fig. 1 depicts
the operation sequence o{ the VLSI-Friendly ART1
algorithm:
1.
2.
3.
4.
All the binary weights Zij are set to ‘1‘.
An input pattern I is applied to the system.
A “choice fimction” Ti is computed for each cate-
gory j. This function ~ = LAIIn Zjl - L.B\zjl+ L~
is a measurement or distance of the similarity
between the input pattern I and the learned vector
Zj corresponding to category j.
The category J whose TJ is maximum is selected.
The corresponding output y, is set to ‘1‘ while all
others are set to yj ~, = O.
Copyright 1997 IEEE 673
5. The vigilance criterion is checked for the winning
category.
If plIl <II n ZJ the criterion is not satisfied, T, is
forced to ‘O’and a new winner is selected.
If p III 2 II n ZJ the weights ZJ are updated accord-
ing to the law
z, (new) = Inzj(dd) . (1)
Fig. 2(a) shows the schematic of the circuit
implementing an ART1 network with 50 input nodes and
10 category nodes. The schematic of a synapse Sij is
depicted in Fig. 2(b) and Fig. 2(c) shows the schematic
of an input cell Ci.
m
mmner- e-A :‘J = ‘ ‘f ‘J= ‘axj ‘Tj}yj = O if j*J
is?7YESplIl>lInz F,=ONOUodate weights:zJ(new) = I nz, (ohi)
Fig. 1: Algorithmic Operation Description of VLSI-Friendly
ART1 System
l,!llllll, ~ (a) A
w~~.JtE .............................. I
w~;,..mg’i
(b) (c)’i
Fig.2: (a) Schematic of the Circuit that Implements the
VLSI-Friendly ART1 Algorithm, (b) Detailed Schematic of a
Synapse, and (c) Schematic of an Input Cell
The array of input cells generates a current
L.A~/i = L~lIl , which enters a tunable gain current
mwror of gain p. This mirror distributes a current
pLAII[ to the input of ten current comparators CCj.
Each synapse outputs two currents:
l A current LAfizii which flows to a common node for
all the smavses m the i – th row, resulting in a total
~–‘~ cum!nt comP~a~~~lC~l ‘hat ‘nt;rs in the
current ‘L “~Iizij
1“
l A current LAlizij – LBzij, that results in a total
current Tj = LA~Iizi - LB~zij + LM
/
which
enters the j - th branch o the WTA.
Each current comparator receives a total current
LAII n Zjl – pLA 111 and compares it Yersus ‘O’. If this
current ISnegative the vigilance criterion is not satisfied
and signal Cj is activated preventing current Tj from
competing in the WTA.
Once a winning node (y, = 1) is stable, signal
“LEARN” is activated and weights z, are updated
changing its stored value to Iizij.
III. Yield and Area Optimization
To obtain good system precision it is important to
make all LA and LB synapse current sources to match
within a certain limit. In our first prototype, a tree-like
current-mirror structure was implemented to generate all
LA and LB currents from two external current
references. The external current references enter to a
multiple-output current mirror which delivers several
output currents which enter as inputs to another stage of
multiple-output cument-mimers. Each multiple-output
cument mirror has at the most ten outputs and is laid out
using common centroid techniques to reduce the
gradient-induced mismatch. After a few stages several
thousands of LA and LB currents are available which
match with a precision better than 1‘%0for currents levels
higher than 5pA . However, this structure is very area
consuming, which results in a very poor yield. That
prototype had a die area of 1cm2 while having a
100-node F, layer and an 18-node F2 layer and
exhibits a yield performance of 690.
Our results of mismatch characterization showed
that is was possible to eliminate the tree-like current
mimer structure while maintaining a current precision
better than 1%. A new ARTI prototype was designed
with an area 15 times less than that of the first prototype
and a 98% yield performance. This prototype chip
occupies an area of 2.5mm x 2.2mm having a 50-node
F, layer and a 10-node Fz layer.
For the mismatch characterization, a special
purpose chip in the ES2- 1.Opmtechnology was designed
[4]. The chip contains a matrix of cells, each of them
,,
Copyright 1997 IEEE 674
containing different sized PMOS and NMOS transistors,
plus decoding circuitry. A simplified diagram of the chip
and the experimental set-up to measure the transistors is
depicted in Fig. 3. All NMOS transistors in the chip have
their sources connected together to pin S. All NMOS
transistors share their drains at pin DN and all PMOS
transistors have their drains connected to pin DP. Every
transistor in the chip has its gate short-circuited to its
source except for one pair of NMOS and PMOS
transistors. The selected pair transistors have their gate
connected to pin G. A host computer controls the
selection decoder and a curve tracer (HP4145). If pin DP
is left unconnected and the curve tracer is connected to
pins S, DN and G each NMOS transistor can be
separately characterized. In a similar way, if pin DN is
left unconnected, each PMOS transistor can be measured
by connecting the curve tracer to pins S, DP and G.
NMOS and PMOS transistors of size
10pm x 10Lm spread over an area of 2.5mm x 2.5mm
were forced to the same VG~ and VD~ voltages so that
their nominal cument was around 10I.LA. The effective
measured currents flowing through the transistors are
depicted in Fig. 4. Fig. 4(a) shows the currents flowing
through the NMOS transistors as a function of the
transistor position in the array. Fig. 4(b) depicts the same
but for the PMOS transistors. As can be seen, each
surface 10(x, y) has two deviation components: a
long-distance gradient component, and a short-distance
noise component. For each surface, the plane
1/ (x, y) = AX+ By + C that best fits the points of the
measured surface /0 (x, y) is computed. Afterwards, the
standard deviation o (Alo) of the difference
I v
Fig. 3: Experimental Set-Up for ‘fYansistorMismatch
Characterization
Fig. 4: Measured Currents for an Array of MOS Tranziztors
with the same VGS and VDS values, (a) NMOS Transisto~
and (b) PMOS Transiatora
AIO(X,y) = 1(,(x, y) – Ig (x, y) (2)
is computed. This deviation is due to the noise
component of surface 10(x, y). The gradient component
is defined by plane 1: (x, y). The maximum deviation
due to the gradient component is given by
A[: = max{l~(x, y)} -min{l~(x, y)} . (3)
On the other hand, for the noise component, 98’%of the
points remain within the *3o (Alo). Consequently, let
us define the ratio between noise component and
gradient component contributions as
6C(AI.)
.,
r =
Al: “ “
(4)
Eight chips could be fully characterized. Each chip
contains several arrays of NMOS and PMOS transistors
of different sizes spread over an area of
2.5mm x 2.5mm. Table I shows the results for NMOS
transistors of size 10~m x 10~m driving a nominal
current of 10IJA. The table shows the noise error
component o (Al,,), the gradient error component AI?,
the ratio r, and” the total error component CT(~)
(gradient+noise). Table II contains the same information
but for PMOS transistors. As can be seen, for this chip
dimensions, this current level and transistor geometries
the noise error contribution is of the same order or higher
than the gradient error contribution, and the total current
error o~ (In) is always less than 1‘ZO.Consequently, for
these c&dfiions it is possible to avoid the use of high
area consuming tree-like mirror structures and directly
implement a simple current mirror with all the needed
outputs. This is the approach used in the present ART1
chip o(AI ) (%) A]P (~) r CJT(l )(%)
1 057 1.30 2.652 067
2 0:62 1.98 1.874 “0.83
3 0.47 3.10 0.921 0.79
4 0.52 0.90 3.456 0.56
5 0.54 1.65 1.959 0.64
6 0.58 3.01 1.160 0.88
7 0.65 1.96 1.996 0.82
8 0.73 2.15 2.027 0.90
Table I: Output current error in an NMOS array
chip ~ (A1 ) (~0) A]P(~) r tTT (/ )(90)
1 058 153 2278 067
2 0.47“ 0:74 3:830 0:51
3 0.48 0.83 3.519 0.51
4 0.40 2.18 1.100 0.63
5 0.46 0.60 4.666 0.49
6 0.45 2.18 1.236 0.72
7 0.44 0.83 3.171 0.50
8 0.41 1.28 1.926 0.50
Table II: Output current error in a PMOS array
Copyright 1997 IEEE 675
chip prototype. The chip has a die area of
2.5mm x 2.2mm, and contains an amay of 50x 10
synapses. Fig. 5 depicts the measured L.Boutput currents
as a function of the output transistor position in one chip
for an input current level of 10KA. Table 111contains the
deviation components measured in the L.B output
currents. The random component is always higher than
the gradient component and the total deviation is less
than 1Yo. Similar results are obtained for the two LA
current sources.
IV. Experimental Results
All ten fabricated chip samples were fully
operational and for none of them we were able to detect
any fault in its subcircuits. All system components could
be isolated and independently characterized. The circuit
performances of the different subcircuits were similar to
those of the first prototype [2].
Although the chip is analog in nature, its inputs and
outputs are digital. Therefore, it is possible to test the
system level behavior using a digital test equipment. We
used the test equipment HP82000 to fully test the system
level operation. The system proved to be very robust and
therefore a multichip system was easy to assemble. The
operation of two multichip systems was also tested: a
two-chip ART 1 system and a three chip system forming
an ARTMAP architecture.
.-88.=
::,= =
.—
.,.0,.C5
.,.0, -
., -
., W.a
.! M
., OS.a
.! -
.! -
.! .s.4
.! Ms.&4
.! W4-M
.! -
Fig. S: Measured LB Current in an ART1 Prototype Chip
chip c$(AL~) (90) AL~ (%) ‘Ln CT(LR)(YO)
1 062 062 60/6 064
2 I 0:59 0:22 16.497 0:60
t 3 0.56 I 3.330 I 1.015 I 0.89 1
4 0.63 0.90 4.196 0.64
5 0.65 1.83 2.118 0.76
6 0.64 1.49 2.565 0.73
7 0.60 1.58 2.255 0.67
8 0.62 1.48 2.524 0.71
9 0.63 0.37 10.080 0.63
10 0.57 2.16 1.573 0.73
1 1 1 I
l b
Fig. 6 Diagram of an ARTMAP Architecture
The two-chip ART1 system consists of two
horizontally assembled ART1 chips. The resulting
system is able to cope with 100-bit input patterns.
The ARTMAP architecture consists of two ART1
subsystems connected through an Inter-ART module as
depicted in Fig. 6, where a is an Na-dimensional input
vector to the first subsystem ART1a, and b an
Nb-dimensional input vector for the second ARTlb
subsystem. An ARTMAP system is a supervised
learning neural network that learns the correspondence
between two simultaneous input patterns a and b. The
Inter-ART module is simply an Max Mb array of binary
weights which learns the correspondence between the
ART1a category which classifies pattern a and the
ARTlb category which classifies pattern b. An
ARTMAP hardware system was assembled using two
ART1 chips and an extra chip for the Inter-ARTMAP
module. The system level operation of the ARTMAP
hardware system has also been tested using the HP82000
digital test equipment.
[1]
[2]
[3]
[4]
[5]
V. References
G. A. Carpenter and S. Grossberg, “A Massively
Parallel Architecture for a Self-Organizing Neural
Pattern Recognition Machine.” ComDuter Vision.
Graphics, a[d Image Processing, kol. 37, pp~
54-115,1987.
T. Serrano-Gotarredona and B. Linares-Barranco,
“A Real-Time Clustering Microchip Neural
Engine,” IEEE Transactions on VLSI Systems,
1996.
T. Serrano-Gotarredona and B. Linares-Bamanco,
“A Modified ARTI Algorithm more suitable for
VLSI Implementations:’ Neural Networks, 1996.
T. Serrano-Gotarredona and B. Linares-Bamanco,
“Systematic CMOS Mismatch Characterization,”
Proceedings of the IEEE Int. Symposium on
Circuits and Systems, 1996.
G. A. Carpenter, S. Grossberg, and J. H. Reynolds,
“ARTMAP: Supervised Real-Time Learning and
Classification of Nonstationary Data by a
Self-Organizing Neural Network: Neural
Networks, vol. 4, pp. 565-588, 1991.
Table 111:Measured LB output current error
Copyright 1997 IEEE 676
