Hybrid hardware for a highly parallel search in the context of learning classifiers  by Bode, M. et al.
Artificial Intelligence 130 (2001) 75–84
Hybrid hardware for a highly parallel search
in the context of learning classifiers
M. Bode a,∗, O. Freyd a, J. Fischer a, F.-J. Niedernostheide b,
H.-J. Schulze b
a Westfälische Wilhelms-Universität Münster, Institut für Angewandte Physik, Corrensstr. 2-4,
48149 Münster, Germany
b Infineon AG, Otto-Hahn-Ring 6, 81730 München, Germany
Received 18 April 2000; received in revised form 7 November 2000
Abstract
Based on a comparison of input data with a set of prototypes, classifier systems identify the most
appropriate representative for a given sample pattern. One remarkable classifier is Kohonen’s Self-
Organizing Map and the related learning vector quantizer, as these algorithms are highly parallel.
For real-time applications the classifier search may be one of the time critical processes. We discuss
specialized hardware being able to execute such a search in a fully parallel manner. Also the learning
and updating of prototypes is performed in parallel controlled by a propagating front. Finally, we
present experimental results concerning an unsupervised learning vector quantizer (LVQ) and a
self-organizing map (SOM) obtained from our thyristor-based analog-digital hybrid system.  2001
Elsevier Science B.V. All rights reserved.
Keywords: Self-organizing map; Learning vector quantizer; Unsupervised learning; Neural net hardware;
Analog; Front propagation; Thyristor
1. Introduction
An adaptive classifier, such as a learning vector quantizer (LVQ) or a self-organizing
map (SOM) [5], has to solve three (LVQ) or four (SOM) essential and potentially time-
consuming tasks, respectively:
(1) Process given input data by means of filters associated with a set of prototypes to
derive an individual “fitness” for each potential representative.
* Corresponding author.
E-mail address: bodemat@uni-muenster.de (M. Bode).
0004-3702/01/$ – see front matter  2001 Elsevier Science B.V. All rights reserved.
PII: S0004-3702(01)0 00 53 -4
76 M. Bode et al. / Artificial Intelligence 130 (2001) 75–84
(2) Find the winner, i.e., the representative with maximum fitness.
(3) SOM only: Address a “cortical” neighborhood of the winner and determine the
individual distance of each member of this set from the winner in order to organize
the learning process.
(4) Execute the learning procedure, i.e., change the prototype values of the winner and,
in the SOM case, of it’s neighbors towards the given input data.
Each of these tasks involves many or all of the representatives.
(1) Fitness values have to be derived for each unit.
(2) To determine the winner, in general each unit has to be taken into account.
(3) Each unit out of the, possibly large, neighborhood has to be aware of its distance
from the winner.
(4) Individual parameters have to be updated for each single unit involved in the learning
process.
In order to address all of these tasks in a time efficient manner we use a highly parallel
architecture and adequately reformulated versions of the algorithms involved.
2. Hardware concept
The main components of the hardware concept used for both the LVQ and the SOM
application are summarized in Fig. 1. The hardware consists of a global controller circuit
and many individual processing units, sometimes called “neurons”. Each unit stores
one prototype vector, calculates the fitness function and compares it against a common
threshold.
Fig. 1. Schematic of the hardware: for clarity only a single gate unit (neuron) is shown.
M. Bode et al. / Artificial Intelligence 130 (2001) 75–84 77
The units are connected to an active medium, namely a multi-gate thyristor structure,
each unit to an individual gate. Each unit can ignite (turn on) the thyristor locally, and
check whether it is ignited at the unit’s position by monitoring the gate voltage. If the
thyristor is ignited locally, this change of state will propagate as a front through the device,
so other units at neighboring gates will detect the ignition after a certain delay. Thus we can
use this delay as a measure of cortical distance, which is needed for the learning algorithm.
2.1. Prototypes and samples
Each unit stores one prototype vector W , compares it to the input vector U , and outputs
the city block distance ||U −W ||L1 as “fitness” value. This is done in analog circuitry,
with capacitors as analog storage, analog switches for reading and writing, and op amps to
calculate the fitness value. A bus system connects all units to the control unit and allows to
(1) write a prototype to an individual unit,
(2) read the prototype of an individual unit,
(3) simultaneously apply an input vector to all neurons,
(4) communicate winner detection.
2.2. Parallel search and winner detection
The winner detection is parallelized in the following way: Each unit compares its fitness
value against a common threshold, which is ramped up by the control circuit. If the
common threshold exceeds the fitness value of a given unit, this unit emits a “success”
signal on the bus that stops the ramp and disables all its competitors. As a consequence,
the first unit that crosses the threshold becomes the winner.
The “success” signal also ignites the thyristor locally, such that the propagating front
can be exploited for communication purposes in connection with the learning rules.
The search time does not depend on the number of units, but on the ramp rate, which
is determined by the desired accuracy of comparison. As the emission and detection of
the “success” signal takes time, two units should not fire in a time interval less than this
reaction time.
2.3. Updating/Learning
The typical learning rule for a stepwise update of prototype vectors as it is used in the
framework of both unsupervised Learning Vector Quantizers and Self-Organizing Maps
reads





where 0 < ε˜ < 1, and h(dij ) denotes a unimodular “neighborhood function” depending
on the distance dij between unit i and the winner j with a maximum at d = 0, i.e., at
i = j . This leads to an ordered mapping of the input space in typical situations such that
neighboring prototype vectors tend to represent similar inputs. For the LVQ case we have
simply h(dij )= δi,j (δj,j = 1, δi,j = 0 ∀i = j ), i.e., only the winner performs a learning
step.
78 M. Bode et al. / Artificial Intelligence 130 (2001) 75–84
The direct implementation of the learning rule was replaced by an implicit method. To
completely avoid both explicit multiplication and the computation of h(dij ), we substitute
the time discrete rule by an ordinary differential equation (o.d.e.):
dWi
dt
= ε(U −Wi). (2)
If this o.d.e. is integrated over a learning time tl , we get essentially the discrete learning
rule with an effective locally convex neighborhood function h(dij ), which is sufficient to
obtain topology preserving mappings [2]. Now this o.d.e. is the same law that governs
the charging of a capacitor in an RC circuit. Consequently the learning process can be
implemented by connecting the storage capacitor C via a resistor R and an analog switch
to the input voltage.
Thus we get ε = (RC)−1, and the corresponding time discrete learning rate ε˜h(dij ) is
determined by the time constant RC and the learning time tl due to Ohm’s law.
2.4. Communication
In Kohonen’s algorithm, the learning rate of each neuron depends on the distance to the
winner neuron. We determine this distance in the form of the propagation time of a front
that is started by the winning neuron. It propagates in an active medium [1,10,11], namely
a thyristor structure.
The running front is used to control the learning time of each unit: When a unit detects
the passage of the front, it starts learning. A time T after the front ignition, all units stop
learning. Thus, the winner unit has the longest learning time tl = T , which decreases to
tl = T − δt for its neighbors depending on the distance to the winner unit. Units with
propagation times δt  T will not learn at all. So T controls essentially the size of the
neighborhood function h(dij ). This learning process is illustrated in Fig. 2.
Fig. 2. Front-based learning process. As the “cortical” wave propagates, more and more prototypes participate in
learning, i.e., they start approaching the input U .
M. Bode et al. / Artificial Intelligence 130 (2001) 75–84 79
3. Hardware prototype
Our first hardware prototype consists of a controller board and one board for each
neuron, connected by a simple bus system. Currently there are 5 neurons in a linear
topography (1D SOM) for a 2D input space.
The thyristor is a sample with 5 gate contacts, positioned linearly in an active region
of 25 × 5 mm2. It was prepared in our lab from a 4-layer silicon structure. We used
photo lithography and wet etching to provide access to the gate layer, and then evaporated
aluminum to obtain Ohmic contacts. Electrical and optical experiments showed that each
gate contact can ignite a front, and that these fronts can be detected by observing the
potential at the gate contacts.
The prototype is used to check the performance of the analog neurons and the connecting
medium. An integrated version with more neurons in a 2D topography, possibly with a
higher dimensional input space, is planned.
Each neuron is connected to the bus that provides several common signals: The current
input vector, coded as two analog voltages, as well as some digital control signals. The
circuitry of each neuron (cf. Fig. 1) contains the memory capacitors for its prototype
value, some analog switches to control learning, and an analog arithmetic part made of
op amps that calculates the (city-block) distance between prototype and input vector. The
gate current that each unit feeds into its gate is proportional to the distance plus a ramp
voltage. To find the winner unit, the ramp voltage is slowly increased, thus the gate currents
rise, until the thyristor ignites at the gate with the highest gate current. This ignition
is detected by a detector monitoring the anode current. On ignition the gate currents
are turned off immediately, so each unit can detect the state of its gate (on or off) by
monitoring the gate voltage. A global learning timer is started. When a unit detects that its
gate is “turned on”, it starts learning. When the global learning timer stops, all units stop
learning.
Each unit also contains a winner detection flip-flop. It is set when a unit is “turned on”,
and one unit in the “on” state blocks all other units via the “mutual exclusion” line, so only
one unit can claim to be the winner.
The global control unit contains the global timer, a detector for global ignition, a ramp
generator, and provides the connections to the buses. The analog data bus and the digital
signals are controlled by a micro controller with 12-bit A/D and D/A converters, and data
is transferred from/to a PC via a serial connection.
In the present setup the front propagation through the whole sample takes ca. 800 µs,
which determines the maximum learning time. In future this can be reduced by reducing
the system size, or by increasing the front speed, which depends on the anode current
density. The ramp duration (winner search time) is set to ca. 50 ms. This rather slow ramp
was chosen to keep the thyristor in a quasi-stationary state, to be sure that the gate with the
largest gate current will trigger the front.
4. Applications
To demonstrate the principal features of our setup, we present two applications.
80 M. Bode et al. / Artificial Intelligence 130 (2001) 75–84
4.1. Learning vector quantizer
In a first experiment we use three learning units to represent a two-dimensional planar
input space. Coordinates range from −4.0 to 4.0 V both in x- and y-direction. The
3 prototypes are initialized at random positions near the center. In the LVQ algorithm
only the winner learns, in other words h(di,j ) = δi,j . This is achieved by choosing the
timer intervals T smaller than the minimum front delay δtmin corresponding to the nearest
neighbor distance, see Section 2.4. Thus the spreading front can’t reach the next unit in the
timer interval and only the winner unit learns. Fig. 3 shows the traces of the prototypes as
they move during the learning process.
In Fig. 4 the prototype positions after learning are shown together with the Voronoi cells
[4] corresponding to each of the three units. These cells represent the actual response of the
network to input vectors. The theoretical Voronoi cells calculated with the L1-norm show
only minor differences.
These differences can be easily explained when we take into account that the L1-norm
of the difference (U −W) of an input U and a prototype vector W is only approximately
computed by means of several differential amplifiers. If the corresponding amplification
coefficients differ between prototypes, their sectors will be deformed.
Fig. 3. Learning traces of three prototype vectors (squares) in an LVQ experiment. Parameters: R = 10 k,
C = 100 nF, T = 128 µs.
M. Bode et al. / Artificial Intelligence 130 (2001) 75–84 81
Fig. 4. Measured (left) and calculated (right) Voronoi cells.
4.2. Self-organizing map
In order to realize SOM learning the global timer interval T has to be rather large at the
beginning of the learning phase, to permit front-spreading over several units, and then T
is decreased so the learning neighborhood shrinks. For the convergence of the algorithm
it is important that both the spatial neighborhood that participates in learning, and the
learning strength, decrease during learning [5]. Thus in the learning process T is gradually
decreased while R is increased. The result of a corresponding five prototype experiment
is shown in Fig. 5. At the beginning the five prototype vectors were distributed randomly
in the two-dimensional input space with x- and y-coordinates ranging from −4.0 to 4.0 V.
Then patterns from a marked area were presented randomly to the network. In the first
phase the learning time T was set to a rather large value of 512 µs, so that the learning
neighborhood extended about 2 neurons to either side. When the neurons had moved into
the marked area, the neighborhood size and the learning rate were reduced. This way the
system relaxed to a good coverage of the region from which the input samples were drawn.
As shown in Fig. 5, the five prototypes are nearly uniformly distributed over the gray
area after learning. As expected, the units are correctly ordered, thus the systems shows
the topology preservation typical for Kohonen’s SOM. This order is reached rather quickly
from any disordered initial state, as is expected for a SOM with very few units. The voronoi
cells corresponding to the 5 units as determined by the analog winner-detection still show
some flaws, some cells are too small while others are too large, especially if a small cell
is located next to some larger cells. This effect is due to imperfectly balanced amplifiers
in the distance calculation part of each unit, and due to unequal ignition currents of the 5
gate contacts of the thyristor sample. This favours some units with respect to the winner
detection, so that such a unit tends to have a larger voronoi cell, i.e., it will win more often
than others. These effects will be reduced in the next, possibly integrated version of the
design.
82 M. Bode et al. / Artificial Intelligence 130 (2001) 75–84
Fig. 5. Positions of the five prototype vectors in a SOM experiment after ca. 1000 learning steps. The random
input patterns are marked by black dots. Parameters: R = 10 (162) k, C = 100 (100) nF, T = 512 (64) µs
during the ordering phase (convergence phase) of training.
We emphasize that similar distributions of the prototype vectors are obtained, even when
the learning parameters are varied in a wide parameter range, as one would expect for
Kohonen’s SOM algorithm.
There is a slow drift of the stored values in the order of 1 LSB/s (with a 12-bit-ADC), so
it takes about 15 s until the accuracy of the stored values drops below 8 bits of accuracy.
This can be circumvented by continuous training with small learning rate, which would
yield a steadily adapting system, or by reading out the stored values and refreshing them
from time to time, like a dynamic RAM.
5. Discussion and outlook
We presented a hardware concept for a fully parallel search problem as it arises in
the context of Learning Vector Quantizers (LVQs) and Self-Organizing Maps (SOMs).
The foundations for this work were laid in [10] and [11]. The search process is based
on a parallel comparison of individually obtained “fitness” values to a common ramp
signal. This comparison is performed by means of a planar thyristor-like four-layer silicon
structure with five gates corresponding to up to five processing units. When the ramp
crosses one of the individual “fitness” values the thyristor is locally ignited which is
detected and indicates both the success of the search and the respective “winner”. As a
consequence of the ignition a switching front propagates through the thyristor that can be
detected by other units. In our implementation we use this effect to provide an efficient
means for communication between different units which is needed for SOM applications.
We chose to work with analog implementations of the learning rules instead of the more
common digital approach. The loss of accuracy seems to be tolerable, in particular for
neural network applications.
M. Bode et al. / Artificial Intelligence 130 (2001) 75–84 83
The present setup is of course very limited in the number of units, but it shows that
larger hardware implementations of Kohonen’s Self Organizing Map are feasible. We are
planning a 2-dimensional SOM implemented in a similar fashion.
The time for the winner search doesn’t depend on the number of units, but on the needed
accuracy, as in the reaction time tr of the thyristor the ramp should rise only one bit of
the input voltage. So for a large number of units and a not-too-large accuracy, like 8 or
10 bits, the implementation should be favorable. Higher accuracy is difficult to achieve
anyway, because of the drift currents that affect the stored voltages. The time needed for
one learning step will increase with the chosen neighborhood size. As for larger networks
one will choose a larger neighborhood at the beginning of the training phase, but not
necessarily later, this learning time should increase less-than-linear with the network size.
The speed of the front ignition and propagation itself could be increased by using thinner
thyristor structures, and higher integration density of gate units. Further improvement could
be achieved by integrating the neural units and the thyristor structure on a single chip.
A useful application for this SOM hardware could be speech compression and recognition
systems. These are our aims for further research.
There are some similar works worth mentioning, e.g., the COKOS chip (COprocessor
for KOhonen’s Self-organizing Map) [12], the TInMANN chip [8] and BISOM [9]. These
are digital approaches, some are fully parallel, and BISOM is aimed at high-dimensional
but low-accuracy problems, e.g., 64× 1 bit or 16× 2 bits.
Analog implementations of the SOM have been described in [3,6,7], which are similar
to our approach in certain aspects. They also store the weights as charges in gate capacities,
and use analog circuitry to get a distance measure.
Our approach is also based on analog circuitry, the major difference is the use of
the thyristor to implement both the cortical neighborhood and the maximum (winner)
detection. This should allow very efficient implementations of self organizing maps.
References
[1] M. Bode, Front-bifurcations in reaction-diffusion systems with inhomogeneous parameter distributions,
Physica D 106 (1997) 270–286.
[2] E. Erwin, K. Obermayer, K. Schulten, Self-organizing maps: Ordering, convergence properties and energy
functions, Biological Cybernetics 67 (1992) 47–55.
[3] P. Heim, B. Hochet, E. Vittoz, Generation of learning neighbourhood in Kohonen feature maps by means of
simple nonlinear network, Electronics Letters 27 (3) (1991) 275–277.
[4] J. Hertz, A. Krogh, R.G. Palmer, Introduction in the Theory of Neural Computation, Addison Wesley,
Reading, MA, 1991.
[5] T. Kohonen, Self-Organizing Maps, Springer, Berlin, 1995.
[6] D. Macq, M. Verleysen, P. Jespers, J.-D. Legat, Analog implementation of a Kohonen map with on-chip
learning, IEEE Trans. on Neural Networks 4 (3) (1993) 456–461.
[7] J.R. Mann, S. Gilbert, An analog self-organizing neural network chip, in: D.S. Touretzky (Ed.), Advances
in Neural Information Processing Systems, San Mateo, CA, 1989, pp. 739–747.
[8] M.S. Melton, T. Phan, D.S. Reeves, D.E. van den Bout, The TInMANN VLSI chip, IEEE Trans. Neural
Networks 3 (1992) 375–384.
[9] S. Rüping, U. Rücking, K. Goser, A chip for self-organizing feature maps, IEEE MICRO 15 (3) (1995)
57–59.
[10] D. Ruwisch, M. Bode, H.-G. Purwins, Parallel hardware implementation of Kohonen’s algorithm with an
active medium, Neural Networks 6 (1993) 1147–1157.
84 M. Bode et al. / Artificial Intelligence 130 (2001) 75–84
[11] D. Ruwisch, M. Bode, H.-J. Schulze, F.-J. Niedernostheide, Synergetic hardware concept for self-organizing
neural networks, in: J. Parisi, S.C. Müller, W. Zimmermann (Eds.), Nonlinear Physics of Complex Systems,
Lecture Notes in Physics, Vol. 476, Springer, Berlin, 1996, pp. 194–212.
[12] H. Speckmann, P. Thole, W. Rosenstiel, COKOS: A COprocessor for KOhonen’s selforganizing map, in:
S. Gielen, B. Kappen (Eds.), Proceedings of the International Conference on Artificial Neural Networks
(ICANN93), Amsterdam, 1993, pp. 1040–1044.
