Abstract: There are arithmetic problems for the hardware realisation of bit-level median filtering algorithms. A design of a majority gate which is composed of output-wired inverters is proposed. The area and time complexities are better than the digital and analogue designs now available. This circuit is applied to a median filter design which is based on majority selection, the computation problems are thus avoided. It is a bit-sliced architecture with constant cycle time. Window shapes can be arbitrarily changed through mask-and-set modules. A median filtering system for twodimensional image processing is presented. A binary majority gate is also an essential element in decision-making circuitry which is applied in fault-tolerant computing systems, artificial neural networks or related applications.
Introduction
Signal smoothing using median filters has grown popular in recent years because of its simple operation and robust performance. A window with an odd number of elements is defined which slides across the digitised input sequence. The median filter simply takes the middle value of the elements lying in the window, as the window moves through the input sequence step by step. Many algorithms and methods have been developed and used in median selection. They can be classified into two categories: word-level and bit-level. In word-level algorithms, the basic operation is applied to a word. The selection of the median value from a sorted sequence is the simplest method. Bubble sort, selection sort, quick sort, or odd-even transposition sort are common examples minor part of the data come in and go out on each movement of the window. However, they are not suitable for hardware implementation because of their irregular data structure and operations. Only the odd-even transposition sort has been chosen to realise the median filtering in hardware [ll-141.
In bit-level algorithms, the median result comes out with one bit at a time. Usually, there is a mask vector to define the effective subset in which the target result lies. The effective subset will shrink as the inspection proceeds from the most significant bits (MSBs) to the least significant bits (LSBs). The major difference is the counting schemes they perform. In Reference 15 the number of bits '0' among the effective subset at each bit position were counted. In Reference 16, bit '1' was counted in each inspection. Both bit '0' and '1' were counted separately in Reference 17 and either of the two numbers was used for the subsequent calculation. Hardware architecture designs for the last two algorithms were proposed in References 18 and 19, respectively. The algorithm in Reference 15 was formulated mathematically in Reference 20. Recently, two similar algorithms based on majority selection were developed independently in References 21 and 22, but with different concerns. The majority selection can be a special case of rank selection based on the positive Boolean function discussed in Reference 23.
In hardware implementation, algorithms based on binary radix are better for the following reasons:
(a) It is more intuitive and simple to derive combinational functions on binary variables.
(b) The basic modules are small, regular and highly repeatable.
(c) Their hardware complexity increases linearly with window size and word length of binary representation. Most of the word-level algorithms are more suitable for software implementation. There is yet another bit-level method which selects the median from a sequence of threshold decomposed signals [24] . However, its complexity increases exponentially with word length and therefore it is not practical in hardware implementation.
These bit-level median filtering algorithms are basically performing a binary search among the unsorted data, while the masking functions are similar. The difficulty in hardware implementation is the counting circuit because it is either implemented by a large adder tree or by a large combinational Boolean function for the speed consideration. Now we consider whether there is a method that can reduce the problem while preserving the high speed of throughput. The majority gate is the target.
2
A binary majority gate as shown in Fig. 1 . number of inputs which is usually odd. The output will be '1' if over half of its inputs are 'l', otherwise it will be '0'. A number of designs can perform this job. They are briefly discussed in the following sections. Fig. 1 . An adder tree can be used to obtain the speed of summation. There is a comparison logic which flags the majority result. If a binary majority function is implemented by an adder tree followed by a comparator, its hardware will expand linearly with W while the delay time increases with log W . 
Threshold logic gate:
A majority function is a special case of a threshold logic gate when the threshold T is equal to (W + 1)/2. An early version of circuit implementation of a threshold-logic gate was a voltage divider by resistor-transistor logic (RTL) circuits [25] . A MOS transistor version of the same circuit was presented in Reference 28, which had a resistor network for weighting inputs and a voltage source in series with the drain to determine threshold level. Although this design is simple, the resistors are area-consuming.
Voltage level comparison:
This is an analogue approach which detects the difference between a voltage divider output and a reference voltage. A design example is Reference 24 realised the voltage divider by nMOS circuits and compared the voltage levels by a differential amplifier.
Device programmable CMOS majority gate
Our majority circuit design is based on the voltage divider in Reference 24, but CMOS technology is used instead of nMOS design. The differential amplifier is replaced by an inverter to simplify the design. This can save the area of the differential comparator and half the number of input signals as in the nMOS voltage divider approach; both positive and negative inputs were required in the nMOS voltage divider.
The majority gate is shown in Fig. 2 . It is made up of two parts: a nonlinear voltage divider built by output-64 wired inverters on the left-hand side and an inverting buffer which senses the majority transition and provides a positive output is on the right. In addition, this inverting buffer serves another two purposes: it isolates the divider output node from external circuitry to reduce As T increases, the output voltage V, steps down a little at first, then the step size broadens at some middle values. Thereafter, the step size shrinks again and finally goes to zero as all input bits become '1'. The current through this divider is changed in the same way as the output voltage step size; it increases at the middle values and goes down to zero at both ends. This phenomenon tells us that this circuit is like a spatial inverter, as the output will be '1' if '0' is the majority among the input signals, and vice versa. The underlying mechanism is that the dynamic behaviour of a single inverter is spatially quantised by the W-input divider. 
channel lengths are fixed. The last one of these three techniques may be the best choice because it is processindependent and the easiest to design. If we choose nine inputs, for example, the majority transition should appear when T goes from four to five. Note that nine For the same design process, a majority circuit with 25 inputs has also been designed and simulated. Its layout dimensions are listed in Table 1 with a 9-majority circuit. Note that the 1.2 ns time delay is equal to two invertor delays, The power consumption is calculated by assuming T is uniformly distributed. This design is low process variation if (Q,, -QL) of the buffering invertor is 1 V. However, the noise margin of 25-majority is zero in the best case, which is why the 25-majority is greatly influenced by process variation and external changes.
when WJW. is between 2 and 2.5, it happens at T = 4-5
and when the ratio falls to within 3 and 4, it appears at T = 5-6. These phenomena are plotted in Fig. 7 . As a matter of fact, one can have the desired maximum We think that the majority circuit below nine inputs can be designed and can work properly under the variation of the process we considered. However, the yield will be lower as more inputs of the majority circuit are required. If this is desired, a more carefully controlled process is required. For an even higher number of inputs, some extra circuitry should be involved to compensate for the process variation and detect the smaller transition gap.
During the simulations with varying channel width ratios, we found that the maximum transition was shifted as WdW. increased. For example, when WAW. = 13/13 = 1, the maximum transition step is at T = 2-3,
66
voltage transition gap by adjusting only the channel width ratio. Together with a proper choice of WdW. ratio of the inverting buffer, the same circuit structure of the majority circuit will become a threshold logic gate with equal weighting on each input. The threshold level is programmed by the channel width ratio of invertors. The majority gate thus becomes a special case.
In summary, there are three steps in building a majority circuit: First, the majority transition gap characteristics with respect to different channel width ratios should be obtained; secondly, the threshold voltage variations of invertor with respect to different channel width ratios should be available; thirdly, a proper W,lWn ratio with a maximum transition gap must be selected for the nonlinear voltage divider and an invertor in which the threshold voltage equals the middle point of this gap. The cascading of the two devices makes up the majority gate.
. . . , W}. These data are taken to be non-negative integers, The simple and regular CMOS design of this majority are gate is very attractive, however the programming of device geometries for different rank orders in the designing stage decreases the run-time flexibility. This is and because the rank-order is fixed after the majority gate is designed.
3
In a one-or two-dimensional situation, the total number of elements lying in the sliding window can be denoted by W (or W = W, * W2), which is called the window size. to as a setting vector for 4 .
The algorithm
Generally speaking, a bit-level median filtering algorithm determines the kth most significant output bit by inspecting the kth most significant bits of all elements in the
Bit-level median filtering algorithm based on majority selection
Mk is referred to as a masking vector and Sk is referred window. Starting from the first MSB, one checks whether '1' or '0' is the majority and the median is in the subset of which the MSB is the majority bit. Thus we set the masking-flags mk(i) to be '1' to indicate the desired subset where the median value stays, and force the elements which are not in the desired subset to a local extreme value by putting the corresponding setting-flags sk(i) to be the opposite value of the present output bit. Once the mk(i) becomes '0' the ith element in the window is no longer in the desired subset and the setting value will take over for the rest of the calculation, i.e. mAi) = 0 and sl(i) = sk(i) for 1 = k + 1, . .., N . For binary signals, the median and majority value are the same. The settingflags help to preserve the rank order of the expected median result, so that the majority selection is happening in each bit position from MSB to LSB. Let the kth MSB of median output result be denoted as uk and let ck(i) be a temporary signal corresponding to bN-k+l(i) and sk(i). We use 'A', 'V' and '-' to denote the logic AND, OR and NOT operations, respectively. The new median filtering algorithm can be formally written as follows : (i) Initially, all elements in the window are in the desired subset ml(i) = entries outside of desired subset:
The main idea of this new algorithm is to get the median output without changing its rank order. In other methods , the rank order of median value had been changed during the calculation. Here the point is to compress the values which are not in the desired subset to be a local extreme value. This process guarantees that rank order of median value is unchanged. The advantage of preserving its rank order is that majority selection alone is enough and there is no need for arithmetic operations such as addition or subtraction. An example which demonstrate the new algorithm is shown in Fig. 8 with N = 4 and W = 9. Those elements which are not in the desired subset are marked by a circle and the neighboring S values are then taken instead to make a majority decision.
Discussion
There are two special properties of this algorithm: First, the majority is the median in a set of binary signals with an odd number of elements. Secondly, the mask-and-set operations presented in steps (v) and (vi) of the algorithm will preserve the rank order of median value through all N cycles of inspections. The former statement is clearly 68 valid and the latter statement is true because they are specially designed to do so.
For k = 1, the first cycle, we are finding the (W + 1)/ 2th large (also the (W + 1)/2th small) data value in the Combining the calculation of cases 1 and 2 from k = 1 to N, the masking operation should be steps (v) and (vi) of the algorithm.
Flexible median filtering system
The advantage of this algorithm is evident from its hardware design. The mask-and-set operations are simple combinational logic and the majority is implemented by a novel design without arithmetic drawbacks.
Word-parallel and bit-pilelined design
In addition to the majority circuit, all functions are included in a mask-and-set (M/S) module. As the algorithm has been written in a single assignment form, every variable is assigned only once. The Boolean functions in an M/S module can be obtained by direct transformation from the software statements of steps (iiiHvi). Let M, S, C and B be the masking, setting, intermediate and binary data bit, respectively. If we carefully check the Boolean function of signals C and S, they are the same, which was verified in Reference 23. An M/S module is presented in is rather simple and regular. It can be cascaded into a word-parallel and bit-pipelined design. If M and S' signals are fed back to the same stage, a bit-serial and word parallel design is formed.
As the cascaded N stages work in a pipeline fashion, the output may not be correct without properly scheduling the 1/0 data bits, therefore skewing delays are needed to delay one more time unit for each successive stage. The input data bits can then be fired at a correct timing slot. Throughout the filter the output result should be skewed back to its original data word. Therefore, 'deskewing' delays are placed at the output side.
Note that the shift register column butted to each stage is called a window buffer. Only one additional row of skewing delays is required for each stage, including both input side and output side, rather than W rows of delays for each stage as presented in similar designs [ll, 13, 181 . A bit-sliced architecture with skewing delays is shown in Fig. 11 . In the hardware realisation of a bit-level median filtering algorithm, the area complexity is dominated by the basic boxes which are for defining the effective subset and the time delay of a single stage is dominated by the counting circuits. These features are compared in Table 3 . The majority design is better because of its linear complexity with constant cycle time.
%
N
System architecture design
Owing to the advance of VLSI technology, many software algorithms are embedded on a single VLSI chip for cost and performance considerations. Modular, regular and repeatable structures are preferred in such a VLSI system. The bit-sliced approach of median filtering provides a bit-level scalable hardware structure which is adaptive to changeable word length. The repeatable nature makes it very attractive to a VLSI design.
In real-time two-dimensional image smoothing, using a median filter with window size 3 x 3, we need buffers to store data in the former two scan lines as the window raster-scans over the image. The structure is as shown in Fig. 12 . Scan line buffers stand for temporary storage and a window buffer expands the pixels lying in the window into a column. A shaper butted to the window buffer provides four types of window shapes: square, cross, 'X and dot, to meet different background noise conditions. As illustrated in Fig. 13 , useful pixel bits are indicated by '*', while unused pixel bits are masked by '1' or '0' equally, so that the rank order of desired output result may not be changed. The median selection unit is just the structure shown in Fig. 10 . where '-' denotes the 'don't-care' conditions. Thus, through initial assignments of ml(i) and sl(i) signals, the window shape can be arbitrarily chosen. Each position in the window can now be assigned separately. This feature allows a very high flexibility in using this structure for a variety of image or signal characteristics. Testing is a very important issue in today's VLSI systems. In ad hoc testing, it is better to partition a large system into several independent submodules so that they can be tested separately. Let us consider the testable design of a single stage; as the other bit-slices can be tested in the same way. Scan lines and window buffers are shift registers or memories in nature and there are standard procedures to test them. In the mean time, they can serve as the scan path buffer for the testing of median selection unit. The problem in testing a median selection unit is the poor observability on signals C from the M/S box to the majority gate. Scan path registers can be inserted here to improve the testability of the majority gate. The M/S modules can be tested independently in the same time by 16 patterns for an exhaustive functional test. The majority circuit can be tested in the same way.
70

5
Concluding remarks
A simple design of a majority gate was proposed, which consists of output-wired invertors. It consumes fewer transistors and has a constant delay time. The programming of majority selection is through the choosing of channel width ratios of p-and n-channel transistors in CMOS circuits. This majority gate was applied to implement a median filtering algorithm based on majority bit selection. A VLSI system architecture design for twodimensional median filtering was also demonstrated. It is window-shape changeable, bit-level scalable and easy to implement. It is a flexible system for high speed signal smoothing.
The mask-and-set operation is a basic function in cellular logic array used in signal processing. The architecture proposed in the last section is in fact a special purpose design for two-dimensional image processing. These simple cells should be applicable to many other signal processing applications such as speech/image smoothing, stack filters and morphological filtering.
By adjusting channel width ratios, the majority circuit may becomes a threshold logic gate with equal input weighting . This approach may play an important role in majority-decision applications, such as threshold decoding circuits, fault tolerant systems, binary artificial neural networks and many other threshold decision-involved designs. 
