We present a ncw scalable architecture for the rcalization of fully programmablc rank order filters (R.OF), based on Capacitive Threshold Logic (CTL) gates. Variants of ROFs, cspccially mcdian filters, are widely used in digital signal and imagelvidco processing and imagc cnhanccment. The CTL-based realization of thc majority gates used in thc ROF architecture allows the filtcr rank and the window sizc to bc oscr-programmable, using a much smaller silicon area, compared to conventional rcaliaat,ions of digital niedian filters. Tllc proposcd iiltcr aschitcct,ure is completely niodular and scalable, and the circuit complexity grows only linearly with maximum window size and with word length. Dctailed post-Iayoiit sirnulations of tlic ROF prototype circuit indicate that tlie new arcliitcctorc can accommodatc sampling clock rat,es of up to 50 MHz, corrcsponding to an effective data proccssiiig ralc of 800 Mb/s for a filter with window sizc 63 ancl word lcngth of I B bits.
Introduction
The rank ordcr filter (ROF) is a non-linear digital filtcr which determines the i-th ranking clcmcnt in a given window consisting of binary cncodcd iriput, words (Fig  1) . Spccial cases of rank order filters are median, minimum and maximmn filters, whcrc the outputs arc the median, thc minimum and thc maxiunnn values of tlie input words, respectively [l] . Variants of ROFs are widcly uscd in digital signal and imagelvidco proccssing bccausc of their noii-linear characteristics. Espccially, median filters have found many applications in digital image enhancement, such as rcducirig tlie high frequency and impulsive noisc in digit,al images without the extcnsivc blurring and edge destruction [Z] [3] . Othcr successful applications of ROFs include the smoothing of noisy pitch contours in spcccll signals, data compression in block t,riincat,ion coding scliemcs, spccklc noise reduction in cohcrcnt imaging systems, arid preprocessing data for machine vision.
Scveral algorithms have bcen proposcd for rank nrder filters that are based on d a t a sorting. Although these algorithms arc suitable for soft,ware implcmcntalion, they result in inefficient hardwarc structures, since they process the input vcctors at the word level. Implementations based on stack filters havc an areat,ime complexity of O(n2), aid the hardwarc complexity increases very rapidly with window size (m).
In reccnt ycars, some iniiovativc bit-scrial strncturcs for rank-order-filters havc bccn presentcd, which are mostly based on majority-decision algorithms [4] , [9] . Yet, the majority function is typically hard t o rcalize using coriventional Boolean building blocks, since it rcquires a large number of gatcs and a large logic depth.
Conscquently, such structurcs suffer from speed and area limitations, especially if thc window sizc bccomes larger than 10 words. Also, most of the conventional realizations rcsnlt in a fixed rank and a fixcil window size, which limit the flexibility of its applicat,ion. In this paper, we present a ncw architecturc to rcalize a fully programmable ROF, based on Capacitivc Thrcsliold Logic (CTI,) gates. The CTL reslimtion of thc majority gates 151 uscd in the ROF architccture allows tlic filter rank and tlic window siac t o bc uscr-programmable, using a m i d i smaller silicon area. In Scction 3, tlic implementation of a prograrninahlc ROF arcliitecturc is discussed. The conclusions arc summarized in Section 4.
The Rank Ordering Algorithm

A l g o r i t h m Description
A hit-serial algorithm first proposed in [GI was chosen as thc basis of the prograuimahle rank-order filtcr architccturc iinpleincntcd in this work. In this algorit,hm, the problcm of finding a rank-ordcr-sclcct,iorr for 11-bil. long words is reduced to finding "n" rankordcr-sclect,ioos for 1-hit numbcrs.
The algorithm start,s by processing the most significant bit,s (MSB) of tlic m=(ZN .i-1) words in the current window, through an in-input progrannnablc nia,jority gate, to yield tlic MSB of llic desircd filter output. This output, is then compared with the otlicr MSBs of the window elements. Tlie vectors whosc MSB is not cqud to tlic filtcr oiitput liavc t,lieir MSB propagatcd dnwn by one position, replacing tlie lcss significant hits of the corrcsponding words. This process is contirnicd for the following bits. Thns, any hit that is not cqnal to the corrcsponding stage out,put is propagatcd down to the lesser significant. positions, until the least significant hit is proccssed. This proccss cnsures that at a latcr stage, any nnmbcr which was great,er (or less than) the i-th ranlxxl number can he idcntified, and the i-tli ranked bit sort,cd out. Figure 2 shows an example whcrc, five S-hit words (denoted I' through T with deciinal values of 184,105, 194, 117 and 75 respectively) arc being rank-ordered using the algorithm dcscribed above. The window sizc is m=5 and the rank is r=3, indicating that the third sinallcst among thcsc five nuinhers is hcing found in 8 steps. Note that the main hit-lcvcl operation at cach step amounts to a majorily (rank) decision among n hits of thc same bit-plane. In tlie cxample, the final rcsult after Stcp 8 corresponds to word S which has the dcciinal value of 117.
Realization of the A l g o r i t h m
The bit-serial opcration flow of thc algorithm dcscribed ahovc suggest,s a vcry simple bitblevel pipelined data path architccture. In thc Modifier/Sclcctor block, t,hc output of the majority function is coinparcd with the corrcsponding data bit, using an XNOR gate. The rcsult of t,liis XNOR operation is then combined (AND opcration) with the select signal originating from the previous block. This provides tlic information if thc data hit talen from t,lie prcvious block is a propagating onc or not,. If the d a h bit is a propagat,ing o m , thcn the ncw sclcct signal will bc 0, indicating that this data bit will contirruc propagating unchanged through the following slagcs. Otherwise, tlic select signal will only depend on tlie result of the coinparison of the filt,er-slicc output, wit,li the current data bit. Idcntical 1-bit filtcr slices can hc used in sequcncc (cascade configuration) in order to proccss input vect,ors of arbitrary bit-lcngtli. Tlms, the filtcr throughput can be increased by hitlevcl pipelining. Thc modular structurc of the one-bit slice descrihcd ahovc also allows for scalable rcalination of the ROFs with different window sizes and word Gate-level structure of a ROF ccll and tlic corresponding layout, allowing mndular cxpansinn.
System Components
There arc two main blocks in the architecture, the ROF-cell and the Majority Decision gate. By using these two blocks, a prograrninable rank-order filter of any window size and word-length can he realized. The word-length dictatcs thc number of the majority decision gates, whereas the window size deterrniiies the number of ROF-cells driving one of thesc majority gatcs. Thc prograrnmablc majority decision gates are realized using the capacitive threshold logic (CTL) circuit architecture prcserited earlier [SI. This allows simplc implcmentatioti of prograrnrnable majority gates with up to 63 parallel inputs, using a vcry small silicon arca (625fim x l3Ofim for 63-bit majority gate).
In comparison, a classical realization of the 63-hit majority gate would requirc an cquivalcrit of 63 6-bit fulladder circuits, arranged in a nctwork of a logic depth of 64 (synthesized from HDL description). Figure 3 shows the ROF-cell block realization at gate Icvel. At cach positive clock edge, the corresponding select and data signals arc fcd t,o the next blocks.
During a clock period, the majority gatc output fccds all thc ROF-cells in its corresponding bit-lcvcl. Thc signal flow hctwccn the ROF cells and the majority gates are shown in Figure 4 . T h e ruodular architecture consisting of only two major blocks enables fully scalable construction of filtcr structures of arbitrary sizc.
Overall System Architecture
Thc t,op lcvcl block diagram of the programmable ltOF dcsign is shown in Fignre 5 . The arctiitccture consists of three main blocks: input shift rcgisters, ROF proccssing core, and output shift rcgisters. To allow bitlevel pipelincd opcration, the input bits are ordered using a staggered shift register array (Fig. 5 ) .
The ROF core has (n.in) R O F cells where m=(2N+1) is thc window size and n is thc bit-length of the input words. The R O F cells proccssing thc bits of same significancc provide the necessary inputs to the corresponding Majority Dccisiori block which dcterniines the filter output bit of that level. This output bit is fed to thc output shift registers and back to thc ROF cclls, to be used in dctcrmining the selcct and data signals which will be the inputs of the next stagc. 
Advantages of the Proposcd Architecture
The realization of fiilly programninblc rank order Elltcrs Iias traditionally I ) P~I I a very clialleiiging dcsign prohlcin, inairily due to the fact, that the rank sclcction fiiiiction (progranimablc majority function) is cxtreinely hardware-iiitensivc using conventional dcsign al)proaclics. As a result, most of the dcsign efforls so far have citlier been constrained to niediaii-only filt,crs without any rank sclcctioii capability, and/or to relat i d y siiiall window sixes [7] , IS].
'.l'lic CTL-based ILOF architecture prcscnted Iicro is superior to otlicr ROB' iinplcnicntations, with its following capahilitics:
1. Tlic C T L realimtioii of the majority gales used in t,lic R.OF archit,ccture allows tho filter rank a,nd t,lic window size to bc fully programmable, using R iiiucli sinallcr silicon area.
.
Tlic rank-ordcriiig slgoritliin iin~ilc~nciited wit,li t,liis architcct,iirc docs not rcqiiirc t,lic clemerits oC the iiipnt window t,o bc pre-ordered, as opposcd to other, stack-based ordering nlgorit,lirns 171.
3.
Tlic proposcd ItOF has a niiidular arcliitect,iirc wliicli eiiablcs easy expaiiilaljility of the window sise and hit-leiigtb ol the input, words witliont, a dranialic cliaiigc in perforiiiancc.
4.
The ovcrall circuit complexity incrcascs linearly with maxiinurn window size (in) and with word leiigtli (n). 
Conclusion
In this paper, wc have presented a new architecture for rcaliziiig a fully programmable ROF, based on the Kar-Pradhan rank ordering algorithm and Capecitive Threshold Logic (CTL) majority gates. The bitserial realization of the rank ordering algorithm offers a simple pipelined filter architecture which is highly modular and casily expandable. The CTL realization of the majority gatcs used in the ROF architecturc allows the filter rank and the window size to be user-prograrnmablc, resulting in a much smallcr silicon arca. In addition, thc CTL bascd majority gates enablc a much simpler ovcrall filter architecture compared t o conventional digital median filter realizations. 
OB UP
I O I8
I S
