We present a new scalable architecture for the realization of fully programmable rank order filters (ROF). Capacitive Threshold Logic (CTL) gates are utilized for the implementation of the multi-input programmable majority (voting) functions required in the architecture. The CTL-based realization of the majority gates used in the ROF architecture allows the filter rank as well as the window size to be user-programmable, using a much smaller silicon area, compared to conventional realizations of digital median filters. The proposed filter architecture is completely modular and scalable, and the circuit complexity grows only linearly with maximum window size (m) and with word length (n). A prototype of the proposed filter circuit has been designed and fabricated using double-polysilicon 0.81am CMOS technology. Detailed post-layout simulations and test results of the ROF prototype circuit indicate that the new architecture can accommodate sampling clock rates of up to 50 MHz, corresponding to an effective data processing rate of 800 Mb/s for a very large filter with window size 63 and word length of 16 bits.
INTRODUCTION
As a generic definition, the rank order filter (ROF) is a non-linear digital filter which determines the i-th ranking element in a given window consisting of (m) binary encoded input words (vectors) . In a simple one-dimensional example, the rank-order filter would process a certain number of input *Corresponding author.
vectors contained in a sliding window, and produce an output that corresponds to the i-th ranked vector in the current window, as illustrated in Figure 1 . As the sliding window moves by one vector, the overall ranking (rank-ordering) will have to be updated to produce the next output, again corresponding to the i-th ranked vector in the new window. Figure 2 Original data window Median filtered output FIGURE 2 Two dimensional application of the rank-order (median) function to a (3 3) window.
two-dimensional application of this principle on a simple (3 3) window, where the center element (pixel) is being replaced by the median value contained in the current window. Popular special cases of rank order filters are median, minimum and maximum filters, where the output is determined as the median, the minimum or the maximum value within the input window, respectively [1] . Variants of ROFs are widely used in digital signal and image/video processing because of their non-linear characteristics. Especially, median filters have found many applications in digital image enhancement, such as reducing the high frequency and impulsive noise in digital images without the extensive blurring and edge destruction [2, 3] . Other successful applications of ARCHITECTURE FOR PROGRAMMABLE RANK ORDER FILTERS 117 ROFs include the smoothing of noisy pitch contours in speech signals, data compression in block truncation coding schemes, speckle noise reduction in coherent imaging systems, and preprocessing data for machine vision.
Several algorithms have been proposed for rank order filters that are based on data sorting. Although these algorithms are suitable for software implementations, they usually result in inefficient hardware structures, since they process the input vectors at the word level. Implementations based on stack filters have an area-time complexity of O(n2), and the hardware complexity increases very rapidly with window size (m) [7] .
In recent years, some innovative bit-serial structures for rank-order-filters have been presented, which are mostly based on majority-decision algorithms [4, 6, 9] . Yet, the majority function is typically hard to realize using conventional Boolean building blocks, since it requires a large number of gates and a large logic depth. Consequently, such structures suffer from speed and area limitations, especially if the window size becomes larger than 10 vectors. Also, most of the conventional realizations result in a fixed rank and a fixed window size, which limit the flexibility of their application.
In this paper, we present a new architecture for the realization of fully programmable ROFs based on threshold logic gates, resulting in a very compact and highly modular structure. The architecture consists of a regular array that is composed of only two types of building blocks, and it allows the construction of filter structures of arbitrary size. The processing efficiency of the proposed architecture is significantly increased by fine-grain pipelining in both directions within the array, where the clock frequency remains essentially independent of the window size (m) and word length (n).
The organization of this paper is as follows: The outline of a simple bit-serial algorithm for rank ordering is presented in Section 2. In Section 3, the implementation of the programmable ROF architecture is discussed, and the main building blocks are presented. The structure and operation of the multi-input majority (voting) function blocks are presented in Section 4, followed by a discussion of the prototype ROF circuit and its test results in Section 5. Finally, the conclusions are summarized in Section 6.
2. THE RANK ORDERING ALGORITHM 2.1. Algorithm Description A bit-serial algorithm first proposed in [6] was chosen as the basis of the programmable rankorder filter architecture implemented in this work.
In this algorithm, the problem of finding a rankorder-selection for n-bit long words is reduced to finding "n" rank-order-selections among 1-bit numbers.
The algorithm can be summarized as follows: Figure 3 shows an example where, five 8-bit words (denoted P through T with decimal values of 184, 105, 194, 117 and 75 respectively) are being rank-ordered using the algorithm described above. The window size is m 5 and the rank is k= 3, indicating that the third smallest among these five numbers is being found in 8 steps. Note that the main bit-level operation at each step amounts to a majority (rank) decision among n bits of the same bit-plane.
In
Step 1, the most significant bit plane is processed, and the output is determined as "0" since only two of the MSBs are equal to "1". Notice that the voting function performed by the programmable majority gate at this bit-plane corresponds to a (>_3-out-of-5) function.
Immediately following this majority decision, the MSBs that do not coincide with the bit-plane output (i.e., the MSB of vector P and vector R) are propagated down to lesser significant bit-planes, thereby eliminating these two vectors as potential candidates for output.
Time
Step1: c>
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
Step 7: _ _ 7". c>
Step 8:
FIGURE 3 An illustration of the rank-ordering algorithm, for five 8-bit words.
ARCHITECTURE FOR PROGRAMMABLE RANK ORDER FILTERS 119
In
Step 2, the majority output is "1" and all 5
bits at this bit-plane match the output; therefore, the process is continued on to the next bit-plane.
Step 3, vector T is eliminated by propagating its "0"-bit down to lesser significant bit-planes. It is worth noting that in some cases, the elimination process may determine the correct output before all bit-planes are processed (as in the example, where the output vector S is essentially found after the 4th step). Yet in our implementation, the algorithm is allowed to progress until it reaches the least significant bit-plane, simply to preserve the timing integrity in subsequent runs. Also note that the algorithm allows bit-level pipelining:
As the process propagates through lesser significant bit-planes, the more significant bit-planes can start operating on the next input vector set.
Realization of the Algorithm
The bit-serial operation flow of the algorithm described above suggests a very simple bit-level pipelined data path architecture. Figure 4 shows the conceptual hardware implementation of the operations associated with one bit-plane.
Note that each bit-plane-module consists of two main blocks:
1. The modifier/selector(propagator) block whose function is to store and to shift the actual data and to calculate the selector signal for the next processing block. 2. The majority or rank decision block which determines the output bit as a function of (m) bits in each bit-plane, with a (>k-outof-m) operation.
In the modifier/selector block, also called a ROF-cell, the output of the majority function is compared with the corresponding data bit, using an XNOR gate. The result of this XNOR operation is then combined (AND operation) with the select signal originating from the previous block. This provides the informtion if the data bit taken from the previous block is a propagating one or not. If the data bit is a propagating one, then the new select signal will be "0", indicating that this data bit will continue propagating unchanged through the following stages. Otherwise, the select signal will only depend on the result of the comparison of the filter-slice output with the A typical layout of the ROF-cell is also shown in Figure 5 . It can be seen that the layout design of this cell permits modular expansion of the array in Figure 7 . The architecture consists of three main blocks: input shift registers, ROF processing core, and output shift registers. To allow bit-level pipelined operation, the input bits are ordered using a staggered shift register array (Fig. 7) [7] .
This increases the processing efficiency.
3. The proposed ROF has a modular architecture which enables easy expandability of the window size and word length of the input vectors without a dramatic change in performance. 4 . The overall circuit complexity increases linearly both with maximum window size (m) and with word length (n). 5 . The clock frequency is essentially independent of the input window size (m), and the overall latency of the pipeline is (n-1) clock cycles.
REALIZATION OF THE MAJORITY GATE USING CAPACITIVE THRESHOLD LOGIC
The design of the modifier/selector block shown in Figure 4 is relatively straight-forward using conventional CMOS logic gates, whereas the majority decision block presents a bottleneck with conventional realization methods both in terms of circuit complexity and in terms of logic depth. This wellknown limitation has traditionally been a significant impediment for the hardware realization of similar structures [1, 8, 11, 12] . Since [5] . The most significant advantage of the CTL-based realization is that it allows the construction of a very-large input programmable majority (voting) function as a single-level logic gate. As presented in [5] , a 31-input programmable majority gate based on CTL is almost three times faster than a full custom standard CMOS realization [10] and occupies approximately one third of the area which results in a area-delay performance increase of nearly an order of magnitude. In addition, the CTL-based majority gate can be easily integrated with the conventional CMOS gates used in the architecture, since the input and output signals are fully CMOS compatible.
The circuit layout of a 15-input single-level programmable majority gate based on CTL is shown in Figure 8 , occupying a silicon area of (320 lam x 70 lam) using 0.8 lam CMOS technology. Figure 9 , where the voting threshold is set to 16 for the first 7 consecutive inputs, and then the threshold is reprogrammed to 5 for the next 8 inputs. This circuit is capable of producing a majority output bit with a typical propagation delay of about 3ns.
A larger version of the CTL-based programmable majority gate with 63 parallel inputs has also been designed for use as the main building block of the ROF prototype chip. The 63-input gate occupies a silicon area of (625 tm 130 tm), and its worst-case input-to-output propagation delay remains less than 8 ns. Notice that one of the most important advantages of CTL-based realization of majority gates is that the silicon area strictly increases linearly with the input size, while the propagation delay of this one-level structure remains a weak logarithmic function of fan-in.
EXPERIMENTAL RESULTS AND DESIGN VALIDATION
A prototype ROF test circuit has been designed using a conventional 0.8gm double-polysilicon CMOS process, to validate the main operation principles of the architecture. The prototype blocks consist of four bit-level pipeline stages, each of FIGURE 8 Layout of the 15-bit input majority gate. The silicon area is (320 gm x 70 gm) using 0.8 gm CMOS technology. FIGURE 9 The illustration of the 31-input majority gate operation.
ARCHITECTURE FOR PROGRAMMABLE RANK ORDER FILTERS
FIGURE 10 Top-level layout of the fabricated prototype circuit.
which contains a 63-bit programmable majority gate to handle the rank selection. Figure 10 . The circuit occupies a silicon area of (800 lm 1150 m).
The operation of the ROF architecture is demonstrated with detailed measurement results in Figure 11 which were obtained using an HP 16500C Logic Analysis System. Here, the window size is 4, and the rank is selected to be 2-meaning that the second largest input word in the sample window will have to be selected. All four bits of the input and the output are displayed individually. The input words for this sample window are "0101", "0110", "0111" and "1000". The correct output, "0111" appears after 3 clock cycles. strating that the prototype circuit produces the correct output sequence with a latency of 3 clock cycles.
Finally, to serve as a dramatic demonstration of the proposed filter architecture, the top-level layout of a programmable ROF core with a maximum window size of 63 samples and a word-length of 16 bits is shown in Figure 13 . To our knowledge, the design of a rank order filter of this complexity has not been attempted before. 
