Sorting networks implemented as νMOS circuits by Rodríguez-Villegas, E. et al.
SORTING NETWORKS IMPLEMENTED AS νMOS CIRCUITS1
Esther Rodriguez, Jose M. Quintana, Maria J. Avedillo and Adoracion Rueda
Instituto de Microelectrónica de Sevilla, IMSE-CNM, Universidad de Sevilla.
Edif. CICA, Avda. Reina Mercedes s/n, 41012 Sevilla, SPAIN.
1This effort was partially supported by the spanish CICYT under Projects TIC95-0094 and TIC97-0648.
Indexing Terms: Digital design, Sorting.
Abstract:
This letter proposes a new realization for n-input sorters. Resorting to the neuron-MOS
(νMOS) concept and to an adequate electrical scheme, a compact and efficient implementation
is obtained
Introduction:
This letter presents a new hardware realization for the problem of building binary sorting net-
works (SN). An n-input SN is a switching network with n outputs that generates an output which
is a sorted (non increasing order) permutation of inputs. Binary SN’s are built from comparator
cells, which have two inputs and two outputs: one of them provides the maximum of both inputs
and the other the minimum. The internal structure of the comparator depends on the application.
Inputs can be binary numbers and the comparator is a complex element, or binary signals and
then, maximum and minimum become OR and AND operations respectively. Figure 1a shows
the behavior of a n-input sorter with k inputs equal to 1’s, and Figure 1b the logic implementa-
tion of the comparator cell. A lot of attention has been for many years devoted to the problem
of efficient SN design [1], being a milestone the constructive method proposed by Batcher [2].
Figure 1c shows a 4-input SN implemented following Batcher’s method.
In this letter, a different approach to implement a sorter is presented, based on the fact that
each output of an n-input sorter depends only on the number of inputs equal to 1. The output i-1
th will be 1 if and only if at least i of the n inputs are 1. So, an n-input sorter can be seen as a
cascaded two-block circuit. The first block provides an output which depends linearly on the
number of 1’s in the applied inputs. The second block takes this output signal and compares it
with a set of n fixed values by means of a battery of comparators, thus providing the set of n
output functions of an n-input sorter.
Recently high-functional νMOS transistors have been developed which can perform weight-
ed summation of multiple input signals at the gate levels [3]. νMOS transistors have a buried
floating polysilicon gate and a number of input polysilicon gates that couple capacitively to the
floating gate. The voltage of the floating gate becomes a weighted sum of the voltages in the
input gates, and hence, it is this sum which controls the current in the transistor channel. A sche-
matic of this transistor is shown in Fig. 2. There is a floating gate and a number of input gates
. Weights for every input are proportional to the ratio of the corresponding input
capacitance, Ci, between the floating gate and each of the input gates, to the total capacitance,
including the transistor channel capacitance, Cchan, between the floating gate and the substrate.
An 8-input sorter has been designed exploiting the above two-block approach and resorting
to the νMOS transistor principle for its implementation, as it requires counting the number of
1’s in the inputs, or equivalently performing an arithmetic addition of the inputs. The compact
architecture and the efficient physical implementation we propose compare very favorably with
the traditional solution.
Electrical realization of a sorter circuit
Figure 3 shows the two-stage schematic diagram of the proposed n-input sorter. The imple-
mentation of the first block resorts to the νMOS principle and to current mirroring to provide
an analog output voltage, , which increases proportionally, in a staircase shape, to the num-
ber of binary inputs equal to 1. This operation is performed by transistor M1-M4 in their satura-
tion regions. Transistors M2 and M4 are equally sized n-channel νMOS transistors. M1 and M3
area equal PMOS transistors. The sorter inputs are the M2 input gates capacitively coupled to
its floating gate with identical coupling capacitances, Cu, which produces a floating gate volt-
age, VFG, linearly dependent of the sum of the inputs. However, with this circuit several input
combinations with different number of 1’s can give floating gate voltages below the threshold
x1 x2 … xn,, ,
V 12
voltage of the NMOS transistor, so not been distinguished. This offset is avoided injecting an
initial charge in the M2 floating gate. For this purpose, inverter I1 has been included as well as
two aditional inputs to transistor M2 with coupling capacitances Cu/2 and C0. With
(initialization mode) switches controlled by this phase short circuit the M2 floating gate and the
output and input of I1, and the input terminals are connected to ground (input
switches not shown in Figure 3). After initialization, when , (processing mode),
, where is the threshold
voltage of inverter I1, . Capacitance is introduced by the
extra grounded input in order to maintain M2 satured, even when the n inputs of the sorter are
at logical 1. This VFG controls the current through M1 and M3. Since M4 is made equal to M2
this circuit produces a voltage at the M4 drain terminal, V1 = VFG.The purpose of using this
scheme to obtain the analog output voltage V1 is twofold. First, to make operation insensitive
to the parasitic charges in the floating gate, thus avoiding the need of post fabrication UV era-
sure. Secondly, to make the resulting staircase shape voltage robust versus process parameter
variations.
The second block is constituted by the set of comparators which have been implemented as
inverters. Each inverter is sized so that its threshold voltage is between two given consecutive
steps of the staircase mentioned above. For example, the output O1 must be a logical one if there
is at least an input at logical one and so the threshold voltage of inverter IO1 is fixed to
, where stands for the voltage at node when the all zero input
vector is applied and corresponds to the voltage at node when an input vector with
only one 1 is applied.
Due to process parameter variations the voltages as well as the threshold
voltage of the comparator inverters can change from their nominal value. In order to reduce this
sensitivity I1 has been made identical to IO1. The role of capacitor becomes now clear; it
assures that with all inputs at logical zero, the  voltage is under .
ΦR 1=
x1 x2 … xn,, ,
ΦR 0=
V FG xi
i 1=
n
∑  
 
VDD Cu⋅ ⋅ Ctot⁄ V I1
*
VDD Cu 2⁄( )⋅ Ctot⁄–+= V I1*
Ctot n 1 2⁄+( )Cu Cchan Co+ += Co
V 1 0( ) V 1 1( )+( ) 2⁄ V 1 0( ) V 1
V 1 1( ) V 1
V 1 i( ) i, 0 … n, ,=
Cu 2⁄
V 1 V IO1
*3
Design and Evaluation of an 8-input sorter
An 8-input νMOS sorter has been designed and laid out in a 0.8 µm double poly CMOS pro-
cess. In order to minimize excessive load of the comparators over the first block, the M3-M4
branch has been replicated. Chains of inverters have been used for the comparators to regenerate
the output signal to full logic swing. Correct operation under process and ambient parameter
variations have been validated through extensive HSPICE simulations of the extracted circuit
including Monte Carlo simulations and simulations using different standard worst case device
parameters. Figure 4 plots these simulation results for nodes and as functions of the
number of inputs equal to 1. As it can be seen, the changes produced at V0 are significantly re-
duced at V1.
For the purpose of comparison, we have designed and laid out also an 8-input sorter follow-
ing the Batcher’s conventional approach consisting in a network of comparator cells. There are
23 of such cells in an 8-input SN. Table 1 compares the area, the time performance and the pow-
er consumption of both sorters. Time characteristics and average power have been measured on
post-layout simulation results using typical device parameters at a supply voltage of 5V. The
worst case delay time corresponds to situations where the inputs or the input sequence are such
that the circuit operation is slowest. In the conventional design an input vector exciting the true
longest path has been used to measure that delay. In the νMOS counterpart an input vector con-
sisting on only 1’s followed by an input vector consisting on only 0’s has been employed. The
power has been measured using a random generated input sequence with 100 vectors.
Conclusions
A newνMOS based realization for n-input sorters have been proposed, and its feasibility illustrated
with an 8-input sorter. Compared to a conventional gate-based implementation, the νMOS design is
very efficient in terms of area. It occupies an area that is nearly an order of magnitude smaller than its
conventional counterpart while exhibiting better time performance. Concerning to the power con-
sumption, it has been observed that it is very dependent of the frecuency for the conventional approach
unlike to the νMOS design. At a frequency of 170MHz and above, the νMOS sorter consumes less
power than the conventional one.
V 0 V 14
References
1 D.E. Knuth, The Art of Computer Programming, Vol. III, Sorting and Searching, 2nd ed.
Reading, MA: Addison-Wesley, 1973, ch. 5.
2 K.E. Batcher, “Sorting Networks and their Applications”, in Proc. 1968 SICC, AFIPS, vol.
32, 1968, pp. 307-314.
3 T. Shibata, T. Ohmi, “A functional MOS transistor featuring gate level weighted sum and
threshold operations”, IEEE Trans. on Electron Devices, 39, (6): 1444-1445, 1990.5
Captions to the figures:
Figure 1: a) Sorting Network with k binary signal inputs equal to 1.
b) Logic gate implementation of the comparator cell (2-input SN).
c) Batcher’s implementation of a 4-input SN.
Figure 2: Schematic of the νMOS transistor.
Figure 3: Two-stage schematic of the proposed n-input sorter.
Figure 4: Simulated behaviour of nodes and as functions of the number of inputs
equal to 1showing the stabilizing action of the circuit.
Table I: Area, time performance and power consumption of νMOS and conventional sorters.
V 0 V 16
.Table 1:
Area worst casedelay
Power comsumption
(@175MHz)
νMOS 5625µm2 4.1 ns 7.8 mw
conventional 45400µm2 5.2 ns 8.2 mw
x1
x2
xn 1–
xn







 k first

 n-k following
outputs to 0
binary n-tuple
with k 1’s
outputs
to 1
Sorting
Network
&
+
x1
x2
x1 x2+
x1 x2⋅
(a)
2-input
SN
2-input
SN
2-input
SN
2-input
SN
2-input
SN
x1
x2
x4
x3
O1
O2
O3
O4
(b) (c)
Figure 1
O1
O2
On 1–
On
Ok
νMOS
x1
x2
x3
xn
Figure 27
x1
x2
xn
O1
O2
On
V1
second stage
first stage
IO1
IO2
IOn
M1
M2
M3
Figure 3
I1
ΦR
ΦR
ΦR
ΦR
V0
M4
VFG
SymbolWave
D1:A0:v(vout1)
D2:A0:v(vout1)
D3:A0:v(vout1)
D4:A0:v(vout1)
D6:A0:v(vout1)
Vo
lta
ge
s 
(lin
)
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
Time (lin) (TIME)
300n 350n
Panel 5
Figure 4
Vo
lta
ge
s 
(lin
)
1.8
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
4
Time (lin) (TIME)
300n 350n
Panel 1
xi
i 1=
8
∑
V 0 V 1
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
2
2.5
3
3.5
4
2
2.5
3
3.5
1
2
3
1 worst case (zero)
2 Monte Carlo
3 worst case (one)8
