A general purpose median filter configuratioii consisting of two single-chip median filters is proposed. One of the chips is designed for the applications requiring variable word-length and variable window size whereas the other one is for real-time applications. The architectures of the chips are based on the odd/even transposition sorting. The cliips are implemented in 3 -p m M2CMOS by using full-custom VLSI design techniques. The chips together with a reasonable external hardware can be used for tlie realizations of many median filtering techniques. In this paper, tlie VLSI design procedure of the chips and their applications t o different median filtering techniques for image processing are presented.
INTRODUCTION
The median of an odd number of elements is defined as the middle element when the elenieuts are sorted. Output of a median filter is the median of its input data, and the resulting nonlinear smoothing filter can filter out tlie impulsive noises from signals and images while preserving the edge-information [l] . Such filters are frequently used in many sigiial and image processing applications. In terms of impulsive noise suppressioa, edge preservation, and ease of design, the performance of median filters are better than tlie other smoothing filters such as linear filters [2] and generalized mean filters [3].
l n 1-D and 2-D standard median filtering applications, a window of size w, w is odd, moves on the sampled values of the signal or image, and then the median of the samples within the window is computed and written as the output element at the location of the center of tlie window. Theoretical analysis and applications of the median filters can be found in the literature [4, 5] . Mostly, median filters are implemented in general purpose computers [6, 7] . However, there are also hardware implementations for faster filtering purposes [8]. Because of the low VLSI cost of sorting structures, most of the hardware median filtering algorithms are based on sorting [9] . Tlie window size of the median filter and the wordlength of the elements are not tlie same in different applications. Also, the required speed of the filtering operation varies depending on tlie application. In order t o meet these changing demands, a general purpose VLSI median filter unit which coiisists of two single-chip median filters, one extensible and one real-time, is designed. The extensible median filter chip is designed for the applications requiring variable word-lengths and variable window sizes whereas the real-time median filter chip is for tlie real-time median filtering applications. The architectures of the chips are bit-level pipelined systolic structures based on tlie odd/even transposition sorting. The cliips are implemented in 3-pm M2CMOS by using full-custom VLSI design techniques. In tlie following sections, the architectures, VLSI implementations, and some possible applications of tlie chips are presented.
ARCHITECTURES

Extensible Median Filter Architecture
The extensible median filter is an odd/even transposition sorting network which is a pipelined regular structure consisting of 9 compare-and-swap stages ( Fig.1.a) . Each stage consists of 5 bitwise compareand-swap units. Each of these units compares two one-bit numbers at its inputs and interchanges them if necessary so that tlie larger one is at the "top". At the output of the last stage, the data will be sorted such that the largest will be a t the top, and tlie median will be in the middle. At each clock, one bit from each word (total of 9 bits) enter the network and one bit of tlie median is obtained a t the output. Tlie flow is from the most significant bits toward the least significant bits both a t the input and a t the
output. Because of the bitwise serial data flow, this structure allows arbitrary word-length, L .
Tlie bitwise compare-and-swap unit (CSU1) is a fiiiite state niacliilie wliicll lias tlrree legal operatioli states: rqunl. ~( I A S , and J U J~~. CSUl is set t o the equal state a t the end of eacli data word by a reset signal. Thus the reset signal flows tlrrougl~ tlie stages of the network a t a rate of one stage per clock cycle by iiieaiis of the pipeliiied delay units. CSUl stays in equal state as long as its inputs are equal. However, it locks itself into one of tlie pass or swap states depeiidiiig 011 its inputs aiid stays in that state until it is reset. The state diagram aiid tlie operations at different states are given iii Fig.1 .b.
In the extensible median filter structure given in Fig.l .a, the upper and b u t e r e.ctcn~zora I/O's (.rt,<,'s and y2l0's) are used t o extend the filter t o larger window sizes. For ti1 = 9, tlie upper arid lower exteiisioii inputs are connected t o logic 1's and logic 0's so that the correspondiiig coinpare-and-swap units act as delay units. On tlie other hand, the design allows tlie iiitercoiiiiectioi~s of many of these chips to form mediaii filters for (11 > 9. The extensible median filter generates its outputs with a delay of ti' + L clocks; and after tlie network is full, it finds one L-bit iiiediaii per L clocks. Although, the resulting speed inay be sufficient for tlie real-time median filtering of 512 x 512 frames with L < 3, it is not enough for tlie real-time filtering of 1024 x 1024 frames with L > 1.
Real-Time Median Filter Architecture
Tlie real-time iiiediau filter is designed by intercoiiiiectiiig 8 odd/evea transposition sorter blocks in parallel [9] (Fig.2.a) . In this network, the data enter in such a way that the iiiost significant bits go t o tlie first block, the second iiiost significant bits t o tlie second block, and so on. Tlie bitwise compare-aiidswap unit used in this network is slightly different tliaii that of tlie extensible one, because t11e"swap" or 'ipass" information flows froiii upper t o lower block so that tlie compare-and-swap unit takes this iiiformation, uses, updates and sends it out (Fig.2.b) . For proper timing, the delay units are included at tlie input aiid output of tlie network.
The real-time median filter has nine 8-bit data inputs and it generates one 8-bit median per clock. At every clock, three new elements enter the chip, corresponding t o the new elements of a sliding 3 x 3 window. Since the clock period is determined by the delay of one compare-and swap unit (CSUS), recent VLSI technology allows the implementation of CSU2 a t a speed larger than the real-time operation rate for the 1024 x 1024 frames with L = 8.
CHIPS
Both of the extensible and real-time median filter architectures are regular arrays of the bitwise compareand-swap units. Also, their internal communication schemes are simple and regular. This makes the VLSI implementations easy and straightforward [17, 18] . The architectures are mapped t o hardware by using standard CMOS logic style [19] in 3 -p double metal n-well process. For generation of the chip layouts, and their simulations, full-custom VLSI CAD tools [20, 21] are used: magic for layout editing, Spice, Rnl, and Esim for simulations. The overall layouts of the chips are shown in Fig.3 . frequency up t o 40 MHz with a power dissipation less than 800 mW at this frequency. I t generates one median per clock so that its throughput is 40 mega medians/s. I t consists of about 22000 transistors and has an area of 45 mm2 (6.8 mmx6.6 mm) and 40 pins.
The testing of the chips are easily accomplished by the functional test techniques [22] since the operations of the cells can be selectively probed by using proper test vectors. The test vectors and the expected outputs are generated by using software tools written for these purposes. There are 500 test vectors for the extensible median filter chip, and 12,000 for the other one.
APPLICATIONS
In image processing applications, median filters are used mainly for noise suppression and for edge detection. For impulsive noise suppression, standard median filtering technique is a good choice. However, for suppression of nonimpulsive noises other techniques such as adaptive-length, separable, recursive, and weighted median filtering techniques may be more convenient. For edge detection, generalized, hybrid, and selective median filtering techniques are frequently used. In addition, the weighted median filtering can be also used for edge detection by choosing the weight coefficients properly.
The designed median filter chips can be selectively used in a processor environment by means of the chip enable signal that each chip has. Furthermore, one can realize any median filtering technique mentioned above by using the extensible and/or the real-time median filter chips together with or without a reasonable external hardware:
For the standard median filtering technique, the exact medians of the elements, in a window size w = 9 with arbitrary word length L , can be found by using only one extensible median filter chip. For w > 9 with arbitrary L , at most [[w /9] 1' ([[.I]' indicates the smallest greater integer) chips are required to find the exact medians. On the other hand, the real-time median filter chip can find the exact running medians of the elements in a window of a fixed size w = 9 with fixed word length L = 8 a t the real-time rate. According t o the simulation results, the extensible median filter chip can run up t o a clock frequency of 30 MHz with a power dissipation less than 250 m W a t this frequency. The throughput of the chip is about 30/L mega medians/s. The chip consists of about 5000 transistors and has an area of 11.7 mm2 (3 mmx3.9 mm) and 28 pins. On the other hand, the real-time median filter chip can run with a clock The extensible median filter is a favorable choice t o realize the adaptive-length median filters [13] , since one can change the window size from 3 t o indefinitely large ones by using the extensible median filter chip(s) by applying logic 0's or 1's to unused inputs of the chip(s) appropriately.
For the realizations of the weighted median filters [lo] , the extensible median filter can be used with a pipelined multiplier t o multiply the input data with the weight coefflcients. Since all input data of the chip are entered t o the chip directly at each move of the window, one can realize an adaptive weighted median filter by changing the weight coefflcients a t each position of the window on the frame. 0 A pair of the extensible or the real-time median filter chips can be used as a selective median filter [15] together with a n external control logic consisting of two full-word subtracter and a full-word comparator.
Either the extensible or the real-time median filter chip can be used as a line-recursive median Alter [13] by loading the window elements from the frame appropriately.
0 The chips can be used for the realizations of the separable median filters [11] without any external hardware.
CONCLUDING REMARKS
A general purpose VLSI median filter unit consisting of two single-chip median filters and its applications are presented. The architectures of the chips are modular and have regular communication schemes which make the VLSI implementations rather easy and straightforward. Both of the architectures are not preferable t o be implemented at larger window sizes since the area is proportional t o the wz. We have chosen w = 9, because this is the most commonly used window size in two dimensional median Altering applications.
The main contributions of this study are the architecture of the extensible median filter and its VLSI implementation. Another achievement of this study is the implementation of the real-time median fllter which can operate at the real-time rate for the 1024 x 1024 resolution frames.
