A compact modular architecture for the realization of high-speed binary sorting engines based on rank ordering by Hatirnaz, I. & Leblebici, Y.
A COMPACT MODULAR ARCHITECTURE FOR THE REALIZATION OF HIGH-SPEED
BINARY SORTING ENGINES BASED ON RANK ORDERING
˙I. Hatırnaz, F. K. Gürkaynak and Y. Leblebici
Worcester Polytechnic Institute
Department of Electrical and Computer Engineering
Worcester, MA 01609-2280
ABSTRACT
A new modular architecture is presented for the real-
ization of high-speed binary sorting engines, based on ef-
ficient rank ordering. Capacitive Threshold Logic (CTL)
gates are utilized for the implementation of the multi-input
programmable majority (voting) functions required in the
architecture. The overall complexity of the proposed bit-
serial architecture increases linearly with the number of in-
put vectors to be sorted (window size = m) and with the bit-
length of the input vectors (word size = n), and the sorter
architecture can be easily expanded to accommodate large
vector sets. Detailed simulations indicate that the sorter
structure can operate at sampling clock rates of up to 50
MHz, where the throughput is boosted by fine-grain pipelin-
ing. It is demonstrated that the proposed sorting engine
is capable of producing a fully sorted output vector set in
(m+n-1) clock cycles, i.e., in linear time.
1 Introduction
The task of sorting an arbitrarily ordered vector set accord-
ing to magnitude (either from-largest-to-smallest or from-
smallest-to-largest) is one of the fundamental operations re-
quired in many digital signal processing applications. It
is also an expensive operation in terms of area-time com-
plexity; software-based solutions require word-level sort-
ing and can become computationally intensive, while the
overall complexity of hardware-based solutions usually in-
creases very rapidly with the size of the input vector set
(number of vectors) and with the bit-length of the input
vectors [6], [1], [3]. The design of efficient sorting engine
architectures is therefore a significant challenge for over-
coming the computational bottleneck of the binary sorting
problem. A number of recent proposals for the realization
of sorting networks rely primarily on median or rank order
filters (ROF), yet their capabilities in terms of window size
and bit-length are typically limited due to rapidly increasing
hardware complexity [6], [7], [2].
In this work, we present a compact and fully modular
sorting engine architecture that is capable of processing a
large number of input vectors in linear time. The overall
architecture is completely scalable to accommodate a wide
range of window sizes and bit-lengths, and the hardware
complexity only grows linearly with both of these param-
eters. The proposed sorter architecture is essentially based
on a fully programmable modular ROF design that was pre-
sented earlier [10], [11]. In the following, we first discuss
the programmable ROF architecture that forms the basis of
the sorting engine, in Section 2. The realization of the sorter
is presented in Section 3, followed by conclusions in Sec-
tion 4.
V1
V2
V3
V4
V5
V6
V7
V8
V9
rank ordering of the
current window elements
i-th ranked vector
in the window (Maximum filter)
The largest vector
The median vector
(Median filter)
The smallest vector
(Minimum filter)
V2
V1 V7
V9
V3
V6
V8
V4
V5
In
pu
t V
ec
to
rs
 (w
ord
s)
Sliding window containing m vectors.
Time
Figure 1: One-dimensional illustration of the rank-ordering
process.
2 The Programmable ROF Architecture
The rank order filter (ROF) is a non-linear digital filter which
determines the i-th ranking element in a given window con-
sisting of binary encoded input words (Fig. 1). Special
cases of rank order filters are median, minimum and maxi-
IV-685
0-7803-5482-6/99/$10.00 ©2000 IEEE
ISCAS 2000 - IEEE International Symposium on Circuits and Systems, May 28-31, 2000, Geneva, Switzerland
ROF
CELL
ROF
CELL
MAJORITY
DECISION
MAJORITY
DECISION
ROF
CELL
ROF
CELL
ROF
CELL
ROF
CELL
CLK
CLK
Shifted_Data_In<i>
D
at
a<
i-1
:j-
1>
Se
le
ct
<i
-1
:j-
1>
Data_Input_Bus<i+1>
D
at
a<
i+
1:
j-1
>
Se
le
ct
<i
+1
:j-
1>
Majority_Output<i>
Shifted_Data_In<i+1>
Majority_Output<i+1>
Data_Input_Bus<i>
Shifted_Data_In<i+1>
Majority_Output<i+1>
Shifted_Data_In<i>
Majority_Output<i>
D
at
a<
i:j-
1>
Se
le
ct
<i
:j-
1>
Filter_Output<i+1>
Filter_Output<i>
R
an
k_
Se
le
ct
io
n_
Bu
s
Data_Input_Bus<i+1>
Data_Input_Bus<i>
CLK
CLK
R
an
k_
Se
le
ct
io
n_
Bu
s
R
an
k_
Se
le
ct
io
n_
Bu
s
D
at
a<
i-1
:j>
Se
le
ct
<i
-1
:j>
Majority_Output<i+1>
D
at
a<
i+
1:
j>
Se
le
ct
<i
+1
:j>
Majority_Output<i>
D
at
a<
i:j>
Se
le
ct
<i
:j>
D
at
a<
i-1
:m
>
Se
le
ct
<i
-1
:m
>
Se
le
ct
<i
+1
:m
>
D
at
a<
i:m
>
Se
le
ct
<i
:m
>
D
at
a<
i+
1:
m
>
Majority_Output<i>
Majority_Output<i+1>
Shifted_Data_In<i+1>
Shifted_Data_In<i>
Figure 2: Detailed signal flow between modular ROF-cells and majority gates.
mum filters, where the outputs are the median, the minimum
and the maximum values of the input words, respectively
[1]. Variants of ROFs especially, median filters are widely
used in digital signal and image/video processing because
of their non-linear characteristics.
In recent years, some innovative bit-serial structures for
rank-order filters have been presented, which are mostly
based on majority-decision algorithms [4], [9]. Yet, the ma-
jority function is typically hard to realize using conventional
Boolean building blocks, since it requires a large number of
gates and a large logic depth.
2.1 The Rank Ordering Algorithm
A bit-serial algorithm first proposed in [6] was chosen as
the basis of the programmable rank-order filter architecture
implemented in this work. In this algorithm, the problem
of finding a rank-order-selection for n-bit long words is re-
duced to finding “n” rank-order-selections for 1-bit num-
bers.
The algorithm starts by processing the most significant
bits (MSB) of the m=(2N + 1) words in the current window,
through an m-input programmable majority gate, to yield
the MSB of the desired filter output. This output is then
compared with the other MSBs of the window elements.
The vectors whose MSB is not equal to the filter output have
their MSB propagated down by one position, replacing the
less significant bits of the corresponding words.
The bit-serial operation flow of the algorithm described
above suggests a very simple bit-level pipelined data path
architecture, consisting of data modifier-propagator blocks
to handle fine-grained data selection, and majority decision
blocks to determine output bits.
Identical 1-bit filter slices can be used in sequence (cas-
cade configuration) in order to process input vectors of ar-
bitrary bit-length. The modular structure of the one-bit slice
described above also allows for scalable realization of the
ROFs with different window sizes and word lengths. De-
tails of the realization of this rank ordering algorithm were
presented earlier in [10] and [11].
2.2 Implementation of the Programmable ROF
Architecture
A programmable rank-order filter of any window size and
word-length can be realized by using the two main blocks
described above. The word-length dictates the number of
the majority decision gates, whereas the window size de-
termines the number of ROF-cells driving one of these ma-
jority gates. The programmable majority decision gates are
realized using the capacitive threshold logic (CTL) circuit
architecture presented earlier [5]. This allows simple im-
plementation of programmable majority gates with up to 63
parallel inputs, using a very small silicon area (625m x
130m for 63-bit majority gate).
The signal flow between the ROF cells and the major-
ity gates are shown in Figure 2. The modular architecture
consisting of only two major blocks enables fully scalable
construction of filter structures of arbitrary size. This struc-
ture operates with a latency of (n-1) clock cycles, producing
one binary output vector (corresponding to the kth ranked
vector in the input window) in each clock cycle.
IV-686
ROF cells majority gates
CONTROL
ROF
CELL
ROF
CELL
ROF
CELL
ROF
CELL
ROF
CELL
ROF
CELL
ROF
CELL
ROF
CELL
ROF
CELL
INPUT
SHIFT
REGISTER
ARRAY
output
sorter
data circulation path
OUTPUT
SHIFT
REGISTER
ARRAY
rank control bus
DATA
MUX
MAJORITY
MAJORITY
MAJORITY
data input
used to obtain
data vectors.
bit-wise staggered
used to convert bit-wise
staggered output vectors
back to normal.
Figure 3: Overall architecture of the proposed sorter engine.
3 Realization of the Sorting Engine
The proposed sorter architecture exploits the fact that the
modular ROF core described in Section 2 is capable of gen-
erating one output vector per clock cycle, corresponding to
the currently selected rank. If the ranking process is re-
peated on the same set of vectors instead of processing a
continuous stream of new vectors, the members of the vec-
tor set can be sorted in linear time by simply changing (in-
creasing or decreasing) the rank in each clock cycle. The
overall architecture of the sorting engine is shown in Figure
3. The flow of data through the modular ROF core is being
regulated by complementary input and output shift registers,
which are used to stagger the individual bit-planes of each
input vector to enable bit-level pipelined operation. The
multiplexer on the input side is used for accepting the input
vectors at the rate of one vector per clock cycle, as well as
for circulating (rotating) the data until sorting is completed.
The control logic is responsible for regulating the data cir-
culation path, and for applying the rank selection signals to
the individual bit-planes, in ascending or descending order.
The fact that each individual bit-plane is capable of process-
ing a different rank at any given time significantly increases
the overall efficiency of this architecture. In a typical sort-
ing run, the control logic simply requests each bit-plane to
process a different rank in each clock cycle, either begin-
ning from the maximum rank and descending, or beginning
from the minimum rank and ascending.
The operation of the proposed sorting engine is illus-
trated with an example in Figure 4. Here, five 4-bit vectors
(A through E) are being sorted by the ROF core. Note that
the first rank (R1) is initially applied to the MSB plane con-
sisting of the bits A1 through E1. In the next clock cycle, the
same rank is used to process the lesser-significant bit-plane
(A2 through E2), while a new rank (R2) is being applied to
the MSB plane. Also note that the staggered data bits are
gradually circulated from the end of the chain to the front,
so that each vector in the window can be completely pro-
cessed. The entire operation requires only (m+n-1) clock
cycles after all input vectors are applied. It is important to
note that the time-complexity of the sorting operation de-
scribed above has a linear dependence both with respect to
window size (m) and with respect to word-length (n).
A complete VHDL model of the proposed sorter ar-
chitecture has also been developed to verify its operation.
The synthesized sorter architecture contains a total of 2190
NAND-equivalent gates, including the input and output shift
registers, data multiplexer, and the ROF core. Note that each
8-bit majority function in this synthesized structure requires
130 NAND-equivalent gates, while each ROF cell has a
logic complexity of about 14 NAND gates. Fig. 5 shows
simulated results of two sorting operations on an arbitrarily
ordered set of eight vectors, each with a word-length of 8
bits. It can be seen that the first output vector is generated
with a latency of (n-1) clock cycles, after the last vector of
the set is entered.
4 Conclusion
A modular architecture has been presented for the realiza-
tion of high-speed binary sorting engines, based on an effi-
cient rank ordering scheme. The overall complexity of the
proposed bit-serial architecture increases linearly with the
IV-687
R2
R1
R2
R1
R3
R4
R2
R1
R3
R5R2 R4R3
R3 R5R4
D1 C1 B1 A1
A2B2C2
B3 A3
A4C4
D3
E2
E3
D4E4
C3
E1
D2
B4
D1B1 A1 E1
E2 D2 C2A2
D3 C3 B3
A4B4C4D4
E3
C1
B2
E4
A3
C1D1B1 A1 E1
B2E2 D2 C2A2
D3 C3 B3 A3
A4B4C4D4
E3
E4
A2
B1C1
B2
A1
E2
D3
C4 B4
C3
D2
E1 D1
C2
B3
A4
A3E3
E4 D4
R1
R4 R5
R5
Step (b)
Step (c)
Step (d)
Step (a)
Figure 4: Illustration of a sorting operation on five 4-bit in-
put vectors in four steps: (a) The staggered input vectors are
shifted into the ROF core, and the first rank (R1) is applied
to the MSB plane. (b) The MSB of the first input vector
(A1) is rotated, R1 is applied to the next bit-plane, and the
new rank R2 is applied to the MSB plane. (c) B1 and A2
are rotated, while R1 is applied to the lesser-significant bit-
plane. The rank R2 shifts down by one, while R3 is applied
to the MSB plane. (d) Bit circulation (rotation) continues,
while the ranks propagate down the bit-planes in descending
order.
number of input vectors to be sorted and with the bit-length
of the input vectors. It was demonstrated that the proposed
sorting engine is capable of producing a fully sorted output
vector set in (m+n-1) clock cycles.
1. REFERENCES
[1] D.S. Richards, “VLSI median filters”, IEEE Trans. Acoust.,
Speech, Signal Processing, vol. 38, pp.145-152, January,
1990.
[2] W.K. Lam and C.K. Li, “Binary sorter by majority gate”,
IEE Electronic Letters, Vol. 32, July 1996.
[3] P. Wendt et al., “Stack filters”, IEEE Trans. Acoust., Speech,
Signal Processing, pp. 898-911, 1986.
Figure 5: Simulation results of a ranking operation on an ar-
bitrarily ordered set of eight vectors. The input set is being
sorted in descending order from maximum (233) to mini-
mum (11) value. The input set can also be sorted in as-
cending order from minimum to maximum value, simply
by changing the rank sequence applied to the majority gates
of each bit-plane.
[4] A. Gasteratos, I. Andreadis, Ph. Tsalides, “Realization of
rank order filters based on majority gate”, Pattern Recogni-
tion, vol.30, no. 9, pp 1571-1576, 1997.
[5] Y. Leblebici, F.K. Gurkaynak, D. Mlynek, “A compact
31-input programmable majority gate based on capacitive
threshold logic”, in Proc. IEEE Int. ASIC Conference 1998,
pp. 281-285.
[6] B.K. Kar, D.K. Pradhan, “A new algorithm for order statis-
tic and sorting”, IEEE Trans. on Signal Processing, vol. 41,
pp.2688-2694, August 1993.
[7] C.C. Lin, C.J. Kuo, “Fast response 2-D rank order algorithm
by using max-min sorting network”, International Confer-
ence on Image Processing 1996, Vol. 1, pp. 403-406.
[8] C. Chen, L. Chen, T. Chiueh, J. Hsiao, “An efficient
pipelined VLSI implementation of rank order filter”, IS-
SIPNN 1994, Vol. 2, pp. 630-633.
[9] C.L. Lee and C.W. Jen, “Bit-sliced median filter design
based on majority gate”, in Proc. Ins. Elec. Eng.-G, vol 139,
pp.63-71, 1992.
[10] ˙I. Hatırnaz, F.K. Gurkaynak, Y. Leblebici, “Realization of a
programmable rank-order filter architecture using capacitive
threshold logic gates”, ISCAS’99 Proceedings, 1999.
[11] ˙I. Hatırnaz, F.K. Gurkaynak, Y. Leblebici, “A modular and
scalable architecture for the realization of high-speed pro-
grammable rank-order filters”, ASIC/SOC’99 Proceedings,
pp. 382-386, 1999.
IV-688
