Median-Rational Hybrid Filter (MRHP) was introduced recently as a new class of nonlinear lilters and applied to image filtering problems. It has been shown that the MRHF have the inherent property that on smooth iireas they provide good noise atletittiition whereas on changing areas the iioisc attenuation is traded for a good response to the change. Moreover, they act in small window and few number of operations, resulting in simple and fast filter structures. We present in this paper a hardware implementation of the MIIIIF which exploits in an cffectivc way the featurcs and the robustness OF both median filters and rational filters. This architecture is suitable for the w e in real time due to its reduced hardware complexity in a p plications where size and cost are of critical significance.
Introduction
Image restoration in a noisy eiivironmcnt is a fundamental problem in image processing. Various filtering techniques have been developed to suppi-ess noise in ordcr to improve the quality of images 131. Among these, nonlinear digital techniques have recently received an increasing interesl, due to their superior c a p b es with rcspect to linearapproaches. This paper focuses 011 the hnrdware implementation of the new nonlinear rational type hybrid filters MRHF, recently introduced in [I] [Z] . The MRHFs are based on rational functions. There arc several advantages in the use of this function. Similarly to a polynomial function, a rational function is a universal approximator (it can approximate any cinitinuou~ function arbitrarily well); however, it can achieve a desired level of mplexity, and possesses better
The MRHF filter uses a two-step appronch to removc both impulsive and Gaussian noise from an image. This filter is formed by a rational operator, whose inputs are the results of thrce sub-functions, i.e. two median lilters (MF) and one center weighted median lilter (CWMF As i t c m be seen from (I) thc more demanding arithmetic operations are thc square function and the division.
The main problem introduced by the square function is the increase in the word width, i.e. in the number of bits in the result, whereas in the RNS all thc opefiitions are evaluatcd with a constant number of bits.ln our work the operator is fixed; thus we can exploit our knowledgc about the parameters appearing in the function in order to reduce its complexity. In the denominator in (1) two parameters are used. In particular we can rewrite ( I ) as follows:
Hardware Implementation
The implcmentationofa filter can bccarriedout in several ways depending on the applications and the constraints to be applicd. In our work we decided to implcincnt the MRHF on an FPGA, because they are easily programmable and the structures can be tested and modified
very quickly. Obviously this approach prcsents somc limitations, mainly due lo the constraints implied by the inner structure of thc FPGA's.
As it can bc seen from its structure, the MRHF is built from a block with the three median filtcrs, which involve sorting of the incoming samples from the image, and a block which realizes the rational function and therefore requires some arithmctic functions.
Each of these sections presents some problem whcn it has to be implemented. Since median filters and weighted median fillers have been widcly studied and inany refcrences can be found in the literaturc, our attenticin focuses on the implementation of thc rational function.
Rational Function
The rational function used in the MRHF is quite simple, nevertheless the design of this block presents some problems if we want to achieve compactness in size and high speed in computation whcn implemented on FPGA's.
As a first approach we decided to use a residue number system (RNS) in order to pcrform the calculation of both numelator and denominator of the rational operator, because it allows a high parallelism in the operations. A residue number system though usually implies redundancy and therefore it needs more hardware. Moreover adders and multipliers in llNS are non-standard structures which require an url hoc hardware tu be implemented. On the other hand FPGA's supply built-in ipcrators optimized for the standard binary system, like addersisubtractors and comparators; cxploiting thcse blocks led us to obtain a faster and smallcr circuitry.
The features of a system depend very much on the architecture chosen for the implementation. In image processing the amiiunt of calculations to he performed in the time unit is very high even if the image size is small. The system should be able to read and to output data at high rate. To this purpose we decided to implement our rational operator with ti pipclined structure. The main drawbdck of this choice is the large amount of mcmory elements required at each step. Moreover the FPGA has a relatively small number of built-in memory elements. Therefore the system has to be designed carefully in order to Following the example in 141 these two parameters has been chosen as follows: h : 6.25, k = 0.01. If we approximate k with a simplc sum ofpowcrs of 2 tlie number of bits required by this operation can be reduced. For instance if k = Z4 + Z5 the range of the term to be squared is limited to 5 bits instead of 8. Moreover thc square function can be implemented as a simplc shift-and-add multiplier with a precision of I I bits in the result.
Simulation runs ovcr several imagcs showcd that the loss of precision introduccd with this approximation is very small and the evaluation of the denominator has bcen simplified to 6 additions.
In this design the roleofthedivision is the most critical, since this operation is highly time end size consuming.
The choice of the actual implementation of this operation strongly depends on the constraints on the design and on the precision required by the rational operator and the application. In fact not all the rational operators allow a course approximation in the results if we went them to work effectively, like, for examplc, the interpolator.
Wc developed two different algorithms for the division.
The first one is an iterativc algorithm derived from [5] which can be implemented both in RNS and in standard binary. This algorithm allows us to obtain a prccision in the results of about 1% after 4 iterations. Furthcr tests on images showed that 3 iterations are enough to obtain a good result and this iillows a good save in terms of size. None the Icss this algorithm requires few combinatorial blocks or lookup tables for the selection of tlie quotient and variable shifts as the new dividend is obtained using thc stiindard formula yn = ?/*-I -qn-l x x, where qTL-I is thc partial quoticnt at itewtion 7 1 -1. Moreover the latency time would be grcatly increased due to thc nccd of several iterations. The second algorithm is very simple and is hascd on successive scalings of both numerator and denominator. Given two numbers n, and d, the division TJ = n / d ciin be represented as follows:
where 8 is a scaling factor, n,and d,5 are rcspectively the scaled numerator and the scalcd denominator and e , is a number such that .se* x zI is the largest number less than In this case we can choose s = 16: in this way each scaling can bc performcd just rc,jecling the 4 less significant bits and the rounding can be easily accomplished summing I or 0 dcpending on thc 4 hits rejected.
Since the numerator and the denominator are reprcscnted with 10 aiid I 1 bits respectively, after at most 2 scalings by I6 the rangc is reduced to 10,161. taking into account the rounding cffect. A small lookup table supplies thc correspondent inverse or the dcnominator d;' .
Since the maximum number of scaling can be 2, the diffcrcnce e,, -c,i ciin assume values in the interval 1 -2 2 only, reducing the complexity of the shift.
As the results in the following scclion show the first algorilhm outperforms the second one and allows us to obtain even better results than the theoretical algorithm. The second algorithm, though, is very compact and introduces a latency in the system which is less than 114 of the latency of the other algorithm. Moreover thc requircd hardwarc can bc easily and efcectively implemented exploiting the features of the FPGA. In Fig.2 the block diagram of the resulting circuitry for the rational function is shown. We implemented the system on a medium size FPGA containing 400 ConfigurableLogic Blocks (CLB).
As it can be seen from ( I ) the numerator is a very simple linear fimction; it requires 2 %bit addcr1suhtraclors and one 9 bit addcr. As we aforcmentioned the design tools for FPGA's allow to exploit some of the featurcs of tlicsc chips. As a comparison with thc RNS system, an 8 bit subtractor in standard binary requires 6 CLB's and can be operated at about 58 MITz; on the other halid a 5 bit adder for the modulo I7 addition requires 7 CLB's and it's operating fl.equency is about 56 MHz. This simple example well justilies the choice of thc binary systcm due to the simplicity of the rational operator.
The denominator is slightly more demanding in terms of hardware hecausc of the scaling and multiplication. Thc numbcr of CLB's rcquired for the denominator is 2.5 times that for the numerator and compriscs the [lip-Hop's for the pipeline.
The division itself requires as much hardware as that uscd for both nnmeriitor and denominator, even if the implemented algorithm is rclativcly simple. The main timing constraints derivc froin this block, inainly due to the arbitration logic used to clioose the appropriate results instead of the traditional shift registcr. This is done because The total numher of CLB's used for the rational function is 191, aboul47% ofthe total numbcrat our disposal; the number of IlipHop is 182 and almost 113 is iiscd to store '$2 during the evaluation of the rational fimction. The inaximuin achievable frequency is 4 I Mllz, therefore this system can he effectively iisetl for rcal time application with images of 768x625 pixels at a fniinc riite of 50Hz. In Fig. 3 the layout on the FPGA is reported.
Experimental results
The implemented system has been tested using images corrupted with i.i.d. noise having the follawing probability distribution:
Three values of X and scveral different values for the SNll have been choscn: X = 0.1 (very iniptilsive noise), X = 0.2 (mixed Gaussian-impulsive noise) and X = 1 (purely Gaussian noise), with SNR= 3dB, GdB, 9dB and 15 dB. As an example, Figs. 4(a) -(e) are respectively tlic origin;il clean image, the noisy imagc with X = 0.1, S N R = 3dD, the image liltered by theoretical MllHF, the output of the implemcnled MRMP system with tlic high precision algorithm for division and the output of the implcmcnted MRHF with thc low prccision algorithm for division. It caii be clearly seen that thc images in Figs. ator so strongly as in the thcoreticnl algorithm. l h e overall effect is a 1 0 w -p ;~ filtering which slightly improves the results. The results obtained with mixed noise show that in h i s case the simplified algorithm usually outperforms the theoretical operator. In Fig.5 wc present the comparisons of results obtaiiied for some images. For completeness sake we reported both the results obtained with the iterative (Hardware ( I ) ) and with the simplified (Ilordwnre (2)) algorithm for the division. All the data have been normalized to the maximuin
Conclusions
In this paper we present a hardware implementation of the new median-rational hybrid filter, which is able tu remove different kinds of additive i.i.d. noise. Our tests have shown that the implemented system is very robust and can be easily implemented on a small-sized FPGA. The maximum achievable frequency is about 40MI-Iz, which makes this filter very suitable for real-time applications.
