A high-precision current-mode WTA-MAX circuit with multichip capability by Serrano Gotarredona, María Teresa & Linares Barranco, Bernabé
280 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998
A High-Precision Current-Mode WTA-MAX Circuit with Multichip Capability
Teresa Serrano-Gotarredona and Bernabe´ Linares-Barranco
Abstract—This paper presents a circuit design technique suit-
able for the realization of winner-take-all (WTA), maximum
(MAX), looser-take-all (LTA), and minimum (MIN) circuits. The
technique presented is based on current replication and com-
parison. Traditional techniques rely on the matching of an N
transistors array, where N is the number of system inputs. This
implies that when N increases, as the size of the circuit and
the distance between transistors will also increase, transistor
matching degradation and loss of precision in the overall system
performance will result. Furthermore, when multichip systems
are required, the transistor matching is even worse and perfor-
mance is drastically degraded. The technique presented in this
paper does not rely on the proper matching of N transistors, but
on the precise replication and comparison of currents. This can be
performed by current mirrors with a limited number of outputs.
Thus, N can increase without degrading the precision, even if
the system is distributed among several chips. Also, the different
chips constituting the system can be of different foundries without
degrading the overall system precision. Experimental results that
attest these facts are presented.
Index Terms— Analog circuits, analog computation, current-
mode circuits, maximum circuits, transistor mismatch, winner-
take-all.
I. INTRODUCTION
WINNER-TAKE-ALL (or looser-take-all) and MAX (orMIN) circuits are often fundamental building blocks
in neural and/or fuzzy hardware systems [3]–[5]. Given a set
of external inputs , their operation
consists in determining which input presents the largest (or
smallest) value, or what is this maximum (or minimum) value,
respectively. If a winner-take-all (WTA) or MAX circuit is
available, a looser-take-all (LTA) or MIN circuit is obtained
by simply inverting the input .1
Hence, this paper will only concentrate on WTA and MAX
circuits.
In literature, the physical implementation of these systems
has been tackled through two main approaches: 1) systems of
complexity: their connectivity increases quadratically
with the number of inputs [6]–[10] and 2) systems of
complexity: their connectivity increases linearly with the num-
ber of inputs [1], [2]. In a system of complexity, as
shown in Fig. 1(a), there is one cell per input; each cell has
an inhibitory connection (black triangle) to the rest of the
cells and an excitatory connection (white triangle) to itself.
Therefore, the system has connections. Each cell receives
an external input . The cell that receives the maximum input
will turn all other cells OFF and will remain ON. If the system
Manuscript received March 4, 1996; revised July 1, 1997.
The authors are with the National Microelectronics Center (CNM), Ed.
CICA, 41012 Sevilla, Spain.
Publisher Item Identifier S 0018-9200(98)00728-8.
1Optionally, a common offset term may be added.
is a WTA circuit, each cell has a binary output that indicates
whether the cell is ON or OFF. In a MAX circuit, the winning
cell will copy its input to a common output. Under some
circumstances2 it is possible to convert the topology
of Fig. 1(a) into an one, as shown in Fig. 1(b). In
these cases, a global inhibition term is computed. Each cell
contributes to this global inhibition, and each cell receives the
same global inhibition. Note that now, each cell contributes
to inhibit itself. Consequently, the excitatory connection that
each cell has to itself must be increased to compensate for
this fact. Typical WTA circuits reported in literature
[1], [2]3 correspond to the topology shown in Fig. 1(c). In
such circuits there are also cells, each receiving an external
input . Each cell connects to a common node, through which
a global property (for example, a current) is shared between
all cells. The amount of that global property taken by each cell
depends (nonlinearly) on how much its input deviates from
an “average” of all inputs. Usually this “average” is not an
exact linear average, but is somehow nonlinearly dependent on
all inputs. The cell with the maximum input takes most (or
all) of the common global property, leaving the rest with little
or nothing. Due to the way this global property is shared and
how the “average” is computed, the operation of these circuits
relies on the matching of transistor threshold voltages of an
array of transistors [1] and/or other transistor parameters. The
number of transistors in the array equals, at least, the number
of inputs of the system. If the WTA or MAX circuit has such
a large number of inputs so that it must be distributed among
different chips, the matching of threshold voltages (and/or
other transistor parameters) will degrade significantly, and the
overall system will lose precision in its operation.
This paper presents an complexity circuit technique
[which can be represented by the topology in Fig. 1(b)] for
implementing either WTA and/or MAX circuits, based on
current-mode principles. The resulting circuit does not rely
on the matching of an -size transistor array, but on precise
local current replication and comparison. The circuit can be
distributed among several chips, as is sometimes demanded
by neural and/or fuzzy systems [11], while not degrading its
precision, as shown in the section on experimental results.
II. CURRENT-MODE IMPLEMENTATION
OF WTA-MAX OPERATION
A mathematical model that realizes the WTA-MAX opera-
tion and which is suitable for an current-mode-based
circuit implementation is presented as follows. Consider a
system of cells, such that each cell produces an output
2 If the inhibition that goes from cell i to cell j does not depend on j.
3The circuit in [2] processes voltage input signals, while the circuit in [1]
and in this paper processes current input signals.
0018–9200/98$10.00  1998 IEEE
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998 281
(a) (b) (c)
Fig. 1. WTA topologies. (a) WTA of O(N2) complexity, (b) transformation to O(N) complexity, and (c) typical topology of O(N) WTA hardware
implementation. Black triangles represent inhibitory connections, white triangles excitatory connections, and shaded circles are generally nonlinear
time-dependent processing elements whose outputs become (after a transient) either “0” or “1.”
current with is the
step function, is the external input to the th cell, and
(1)
Fig. 2 graphically represents functions
and . Their intersection provides the solution
to (1). If , (1) has a unique equilibrium point
. Furthermore, if , the value of at the
equilibrium point is and the cell that drives
a nonzero output is the winner. If each input is
changed to , where is an upper bound for all input,
, an LTA and/or MIN circuit results. Fig. 3(a)
shows a current-mode circuit that implements the operation of
one cell for the case . It consists of a two-output
current mirror, a digital inverter, and a MOS transistor. Each
cell receives two input currents, and , and delivers one
output current . The inverter acts as a current comparator.
If , the inverter output is low, the MOS transistor
is OFF, and is zero. If , the inverter output
is high, the MOS transistor is ON, and . Fig. 3(b)
depicts the transfer curve of this unit cell. Fig. 3(c) and (d)
shows the detailed schematic of the fabricated cells, one in the
double-poly MIETEC 2.4- m technology and the other in the
single-poly ES2 1.0- m technology, respectively. Fig. 4 shows
the complete WTA or MAX circuit. It consists of unit cells
and an additional -output current mirror. The function of
the -output current mirror is to deliver the sum of currents
to each of the unit cells. Replication of current
must be very precise. If the number of unit cells is
too large, or if the circuit has to be distributed among several
chips, high precision in replication cannot be guaranteed by
a single current mirror with outputs. In this case, replication
of current must rely on several mirrors with a smaller
number of outputs but with guaranteed precise replication.
Fig. 5 shows an arrangement to distribute the circuit of Fig. 4
Fig. 2. Graphic representation of the solution of (1).
among several chips. The fact that current can be replicated
many times without relying on the matching of a large array
of transistors is the main advantage of this WTA and MAX (or
LTA and MIN) circuit technique over other implementations.
III. SYSTEM STABILITY ANALYSIS
Let us assume that the dynamics of each cell [see Fig. 3(a)]
can be modeled by the following first-order nonlinear differ-
ential equation:
(2)
where is the total capacitance available at node
is the total conductance at this node, and is the inverter
282 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998
(a) (b)
(c) (d)
Fig. 3. WTA unit cell: (a) simplified schematic, (b) transfer curve, (c) circuit
diagram of cell fabricated in the MIETEC 2.4-m technology, and (d) circuit
diagram of cell fabricated in the ES2 1.0-m technology.
Fig. 4. Diagram of the WTA circuit.
trip voltage. Let us also assume that the output current of a
cell is given by
(3)
where is a continuous and differentiable approximation
to the step function. For example, we can define as the
sigmoidal function where is positive
and nonzero but close to zero. Now consider (2) for two nodes,
and . Let be the node that eventually should become the
winner. If we subtract (2) for the two nodes and , then
(4)
Fig. 5. Strategy to assemble several chips.
Equation (4) has the following solution:
(5)
After a few time constants , the difference between the
two node voltages will remain constant and equal to their
difference at the equilibrium point. Therefore, if we can obtain
the expression for , applying (5) would obtain
for the rest of the nodes.
Consider now (2) for node , and substitute (3) into it
(6)
Since is given by (5), after a few time constants (6)
becomes
(7)
This first-order differential equation has stable equilibrium
points if . Deriving (7) with
respect to results in
(8)
Since , and are always positive, (8) is always
negative for all possible values of (including its unique
equilibrium point). Consequently, (7) represents the dynamics
of a stable system.4 This discussion assumes that the -output
current mirror presents no delay. This is not very realistic,
however it can be shown [12] that the circuit is still stable
4The stability proof given in [20] for this system is not correct because
it implicitly assumes symmetric interconnection weights between the cells,
which is not true.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998 283
when assuming the -output current mirror presents a delay
modeled by first-order dynamics.
Performing electrical simulations of the circuit in Fig. 4
reveals that the previous stability analysis is a good approx-
imation as long as the equilibrium point does not lie in the
transition region of any of the sigmoidal functions .
This can only be guaranteed if and the two largest
inputs and are sufficiently different. If or (with
) if two or more inputs are maximum and very
similar, the equilibrium point of the system (see Fig. 2) will
be in the transition region of some sigmoids . In these
cases, transistor parasitic elements that have been neglected
in the analysis of Section IV may render unstable behav-
ior. Consequently, some kind of compensation is necessary.
Under unstable conditions the system exhibits the following
characteristics (observed through electrical simulations with
HSPICE).
a) Only the cells whose sigmoid functions must be
in their transition region at the equilibrium point are
unstable. The rest of the cells behave as if the system
had reached its equilibrium point.
b) The unstable cells present oscillations (presence of com-
plex conjugate poles).
c) In the case of and with two or more equal
maximum inputs, the steady-state oscillating waveforms
at these cells become the same, regardless of their initial
conditions.
This last observation suggests that a stability analysis could
be performed by simply considering one cell in the system,
which represents the parallel connection of all unstable cells,
as shown in Fig. 6(a). On the other hand, since the unstable
cells have the equilibrium point in the transition region of their
sigmoid , we can linearize these sigmoids for the stability
analysis. Therefore, let us consider the small signal equivalent
circuit shown in Fig. 6(b), where the circuitry comprised by
dashed lines represents the parallel of all cells with equal
and maximum input. The rest of the circuitry models the -
output current mirror (or set of current mirrors) responsible
for distributing the global current among the cells. The
minimum set of dynamic elements needed for the system to
present unstable oscillating behavior are parasitic capacitors
, and (observed through electrical simulation).
Performing small signal analysis on the circuit in Fig. 6(b),
it can be shown that the stability condition for this circuit is
approximately [12]–[13]
(9)
where is the number of cells with equal and maximum
input. This condition is not easy to satisfy since must be
large for proper operation, may become large, and it is not
trivial to make the right hand side of (9) very large. Stability
compensation can be achieved by introducing capacitor , as
shown in Fig. 6(c). By small signal analysis of this circuit, it




Fig. 6. (a) Parallel connection of unstable cells, (b) uncompensated small
signal equivalent circuit, and (c) compensated small signal equivalent circuit.
Note that now the stability condition does not depend on
gain and is easier to fulfill. However, now capacitor
degrades the settling speed of the system. Capacitor acts
as a Miller capacitance. Since the dc gain from node to
node is approximately (i.e., the negative of the slope
of ), there will be an effective Miller capacitance of value
in parallel with the original capacitor. If the
sigmoid is not in its transition region, , but if the sigmoid
is in its transition region, can be very large. Therefore, for
compensated cells, (7) must be changed to
(11)
If the winning cell is in its transition region,
and a large capacitance is present at node .
Otherwise, and the effective capacitance
is only .
IV. EXPERIMENTAL RESULTS
A WTA-MAX system with competing cells
has been designed and fabricated in two different CMOS
technologies. The first prototype has been integrated in a
double-metal single-poly 1.0- m CMOS technology (ES2),
and the other in a double-metal double-poly 2.4- m CMOS
process (MIETEC). Both technologies were available through
the European silicon foundry service, EUROCHIP. Circuit
schematics and transistor sizes of the unit cells are shown
in Fig. 3(c) and (d) for the MIETEC 2.4- m and ES2 1.0- m
CMOS processes, respectively. Sizes of the PMOS current
mirroring transistors were 175 m 4 m and 151 m 2.5
m for the MIETEC and ES2 prototypes, respectively.
If the circuit is going to be used as a MAX circuit, all current
mirrors must provide good replication precision. They need
to have small systematic errors and small random deviations
284 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998
TABLE I
CURRENT-MODE WTA PRECISION MEASUREMENTS
[14], so that the resulting value of current resembles the
maximum among all inputs as close as possible. However, if
the circuit is going to be used as a WTA circuit, requirements
are not that severe. If inside one single chip, a WTA per-
forms the same even if the current mirrors have appreciable
systematic errors. Since systematic errors are common with
respect to all inputs, the system can still determine which
input is maximum. On the other hand, random mismatch
errors in the current mirrors must be kept small because these
errors change randomly from one input to another. Reducing
random errors implies using larger transistor sizes. Reducing
systematic errors implies using more elaborate current mirror
topologies that either reduce their output conductance (using
cascode [15], regulated cascode [16], or gain-boosting [17]
techniques), decrease their input impedance [18], or both [19].
The application we had in mind when we developed this circuit
was a WTA for a multichip real time clustering system [11].
Consequently, it was not critical that the final value of be an
exact replica of the maximum of the inputs. Therefore, we used
a simple three-transistor current mirror (without any output
conductance or input impedance decreasing technique) for
the two-output NMOS current mirror of each cell. However,
we used active input current mirrors [18] for the -output
PMOS current mirror and for the extra NMOS assembling
current mirror (see Fig. 5). These current mirrors assure fixed
voltages at their input nodes. This was necessary because if
the system is distributed among several chips, the presence
of the assembling current mirror would break the symmetry
between some of the inputs, making systematic errors affect
these inputs differently. The following presents proper system
operation of a WTA circuit in one single chip, in two chips
of the same technology, and in two chips each of a different
technology. As will be shown, the dc behavior of the system
is not degraded when the operation is distributed among
several chips. In the remainder of this section we will detail
experimental measurements related to the precision of a WTA
and its speed response.
A. Operation Precision
The dc transfer curves of the system have been measured
for different input current levels and for different system
configurations. Fig. 7 shows 30 transfer curves when the
competing cells are inside the same chip. Each curve is
obtained by randomly selecting a pair of input cells and
applying a constant input current to the first, and
Fig. 7. Transfer curves of the WTA implemented in a ES2 1.0-m chip for
an input current level of 100 A.
sweeping the input current of the second from 0.9
to 1.1 . The figure represents the two inverter output
voltages and versus the current . For each pair of
cells and , we measured the value of at the point where
. Let us call this value . Thirty curves were
measured for each value of , resulting in 30 values of .
The difference between the mean of these 30 values and
is a measure of the systematic error of . Let us call it .
The variance of the 30 values represents the random error
of . Let us call it . In the case of Fig. 7, corresponding
to a WTA inside one single chip fabricated in the ES2 1.0-
m CMOS technology with A, we measured a
random deviation of % and a systematic error
of %.
Table I contains the measured total error (defined as
) for three decades of change in . The table shows
results for the cases of WTA’s inside one chip, assembled
using two chips of the same technology, and assembled with
two chips of different technologies. Note that the precision
degradation is very small when the system is distributed among
two chips, regardless of whether the chips are of the same tech-
nology or not. This is the main advantage of this WTA-MAX
circuit with respect to others reported in literature [1], [2].
B. Operation Speed
Delay measurements were performed as follows. Only two
input signals were made nonzero. Let us call them and .
Current was made constant and equal to , while current
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998 285
TABLE II
MEASURED DELAY TIMES FOR ONE-CHIP WTA’s
Fig. 8. Transfer curves when two ES2 1.0-m chips are assembled and for
an input current level of 10 A.
changed in a pulse between values and
, as shown in Fig. 8(a). The pulse starts at
time and ends at time . Waveforms and have
the shape depicted in Fig. 8(b). Four different delay times
were measured. For the system response caused by a rising
edge in , time is the delay between time and the
instant at which voltage crosses the 50% value of its
range. Delay is the same for output voltage . For the
system response caused by a falling edge in , time is
the delay between time and the instant at which voltage
crosses the 50% value of its range. Delay is the same
for output voltage . Measurements were performed for
values of 10 A, 100 A, and 500 A, and for equal
to 0.2 and . Table II shows the measured delay times
for those cases where the system is inside one single chip.
Table III shows the delay times measured when a WTA is
assembled using two chips of the ES2 1.0- m process. Note
that, in general, speed is degraded for a two-chip WTA. When
the system is scaled up (increasing the number of inputs and
chips) its speed will be further decreased. However, as long
as current levels are maintained, its precision is preserved.
Note that when increasing the number of inputs, the current
levels can be maintained, because in the steady state (for one
single winner) there is only one two-output NMOS mirror ON
TABLE III
MEASURED DELAY TIMES FOR A TWO-CHIP WTA
and the PMOS mirror(s) drive its corresponding input current.
On the other hand, for stability, (10) has to be satisfied: by
increasing the number of chips, capacitance will increase;
however if instead of (10) the following condition is imposed:
(12)
the system will remain stable no matter how large is.
V. CONCLUSION
A WTA-MAX circuit design technique based on current-
mode signal processing has been proposed. The precision of
the circuit relies on the proper replication and comparison of
currents. This maintains good precision for circuits with a large
number of inputs and when the circuit is distributed among
several chips. Stability analysis of the proposed circuit has
been addressed and stability conditions derived. A stability
compensation scheme has been proposed. Two prototypes, for
two different technologies, have been designed, fabricated, and
tested. Proper performance has been experimentally verified
for both prototypes, as well as for circuits assembled with
different chips, even if each chip is of a different technology.
The performance of this WTA-MAX circuit as compared to
previous implementations [1], [2] is similar for both precision
and speed. Actually, for speed performance, worst results
would be expected with the proposed circuit since it needs
stability compensation. The advantage of the present circuit is
that it does not loose precision when used in multichip systems.
In order to achieve this with previous implementations [1],
286 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998
[2], some on-chip calibration schemes would be needed to
compensate for interchip systematic transistor mismatch errors.
REFERENCES
[1] J. Lazaro, R. Ryckebusch, M. A. Mahowald, and C. A. Mead, “Winner-
take-all networks of O(N) complexity,” Advances in Neural Inform.
Processing Syst., vol. 1, pp. 703–711, 1989.
[2] J. Choi and B. J. Sheu, “A high-precision VLSI winner-take-all circuit
for self-organizing neural networks,” IEEE J. Solid-State Circuits, vol.
28, May 1993.
[3] S. Haykin, Neural Networks: A Comprehensive Foundation. New
York: IEEE Press, Macmillan, 1994.
[4] E. Sa´nchez-Sinencio and C. Lau, Artificial Neural Networks: Paradigms,
Applications, and Hardware Implementations. New York: IEEE Press,
1992.
[5] J. C. Bezdec and S. K. Pal, Fuzzy Models for Pattern Recognition. New
York: IEEE Press, 1992.
[6] S. A. Elias and S. Grossberg, “Pattern formation, contrast control, and
oscillations in the short term memory of shunting on-center off-surround
networks,” Biological Cybernetics, vol. 20, pp. 69–98, 1975.
[7] T. Kohonen, Self-Organization and Associative Memory, 3rd ed.
Berlin: Springer-Verlag, 1989.
[8] A. L. Yuille and D. Geiger, “Winner-take-all-mechanisms,” in The
Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed.
Cambridge, MA: MIT Press, pp. 1056–1060, 1995.
[9] Y. He, U. Cilingiroglu, and E. Sa´nchez-Sinencio, “A high-density and
low-power charge-based hamming network,” IEEE Trans. VLSI Syst.,
vol. 1, pp. 56–62, Mar. 1993.
[10] B. Linares-Barranco, E. Sa´nchez-Sinencio, A. Rodrı´guez-Va´zquez, and
J. L. Huertas, “A modular T -Mode design approach for analog neural
network hardware implementations,” IEEE J. Solid-State Circuits, vol.
27, pp. 701–713, May 1992.
[11] T. Serrano-Gotarredona and B. Linares-Barranco, “A real-time clus-
tering microchip neural engine,” IEEE Trans. VLSI Syst., vol. 4, pp.
195–209, June 1996.
[12] T. Serrano-Gotarredona, “VLSI neural categorizers,” Ph.D. Dissertation,
University of Seville, Dec. 1996.
[13] T. Serrano-Gotarredona and B. Linares-Barranco, “Experimental results
on the current-mode WTA-MAX circuit with multichip capability,” in
1997 IEEE Int. Symp. Circuits and Systems (ISCAS’97), Hong Kong,
1997, vol. 1, pp. 561–564.
[14] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, “Matching
properties of MOS transistors,” IEEE J. Solid-State Circuits, vol. 24, no.
5, pp. 1433–1440, 1989.
[15] P. E. Allen and D. R. Holberg, CMOS Analog Design. New York:
Holt-Rinehart and Winston, 1987.
[16] D. Sackinger and W. Guggenbuhl, “A high-swing, high-impedance MOS
cascode circuit,” IEEE J. Solid-State Circuits, vol. 25, pp. 289–298, Feb.
1990.
[17] K. Bult and G. J. G. M. Geelen, “The CMOS gain-boosting technique,”
Analog Integrated Circuits and Signal Processing, vol. 1, pp. 119–135,
1991.
[18] D. G. Nairn and A. T. Salama, “A ratio-independent algorithmic
analog-to-digital converter combining current mode and dynamic
techniques,” IEEE Trans. Circuits Syst., vol. 37, pp. 319–325, Mar.
1990.
[19] T. Serrano and B. Linares-Barranco, “The active-input regulated-
cascode current mirror,” IEEE Trans. Circuits Syst.–I, vol. 41, pp.
464–467, June 1994.
[20] , “A modular current-mode high-precision winner-take-all cir-
cuit,” IEEE Trans. Circuits Syst.–II, vol. 42, pp. 132–134, Feb. 1995.
