Abstract-This paper proposes to implement multifunctional image filters using multifunctional gates such as polymorphic gates or multiplexed ordinary gates. The design procedure is based on evolutionary design and optimization conducted using Cartesian genetic programming (CGP). Because of the complexity of the problem the design is decomposed to two phases. In the first step, a multifunctional filter is evolved at the registertransfer level (RTL) using a set of processing elements containing functions such as minimum/maximum, minimum/average etc. over two pixels. In the second step, gate-level implementations of the processing elements utilized in evolved filters are designed and optimized using CGP in combination with conventional logic synthesis tools. It is shown that resulting filters exhibit good filtering capabilities. They are also area-efficient in comparison with solutions based on multiplexing of ordinary filters.
I. INTRODUCTION
Obtaining flexibility, adaptation and multifunctionality directly at the hardware level is one of the most important goals we can now observe in the reconfigurable hardware field [1] . These objectives are usually achieved by hardware reconfiguration which is performed by supplying a new configuration bit stream to a configuration memory. The configuration bit stream then activates relevant configuration signals and switches which, when activated, establish new circuit connections and thus new functionality directly in hardware. The main advantage is that many new configurations (even those unanticipated during the design time) can be created in the runtime. However, the associate area/latency overhead caused by the reconfiguration infrastructure (configuration switches, network and memory) makes this type of systems very expensive in applications, where only a few preselected configurations have to be employed.
One of possible approaches to achieving a low cost reconfiguration could be based on multifunctional gates. They have special physical structures enabling to behave differently with respect to external conditions. Recently developed graphene logic gates based on graphene pn junctions are capable of performing several logic functions just by adjusting some control voltages [2] . Various logic functions can also be provided by polymorphic CMOS gates introduced ten years ago. Their control is implemented via control voltages, power supply voltage (V dd ) or temperature [3] , [4] . As the V ddbased control does not require any additional wires the use of polymorphic gates could reduce interconnecting networks in reconfigurable chips significantly. In this paper, it is proposed to combine low-level image processing functions such as noise elimination and edge detection into one compact multifunctional circuit -multifunctional filter. The resulting circuit is then composed of multifunctional gates connected using a network which is fixed and predesigned. Predefined filtering functions are activated using a suitable setting of control signals. It is supposed that a significant reduction in the area and routing can be obtained in comparison to a conventional implementation which is typically based on multiplexing of conventional filtering circuits.
Designing a compact multifunctional filter, which contains multifunctional gates is a challenging task. In this paper, we will consider multifunctional circuits performing just two different functions (F 1 and F 2 ) and thus operating in two different modes (although extending the concept for more modes is straightforward). Formally, a single network containing multifunctional (and ordinary gates if needed) has to be constructed such that F 1 is implemented in the first mode and F 2 is implemented in the second mode (Fig. 1) . Polymorphic NAND/NOR gate is a typical elementary component of such circuits [3] , [4] .
Various synthesis and optimization methods for multifunctional gate-level circuits, including evolutionary design and optimization, were proposed [5] , [6] . Evolutionary design has also been applied to design image filters suppressing a given type of noise [7] , [8] . However, image filter evolution has been conducted at the functional level since the gate-level representation is too low-level to evolve reasonably working filters.
In order to evolve multifunctional image filters, we propose to apply two-step evolution [9] . In the first step, multifunctional filter is evolved at the register-transfer level (RTL) using a set of processing elements containing functions such as minimum/maximum, minimum/average etc. over two pixels [10] . It is assumed that such functions can easily be composed using multifunctional gates. In the second step, gatelevel implementations of multifunctional processing elements utilized in evolved filters are designed and optimized using evolution. Both design tasks will be carried out using Cartesian genetic programming (CGP). We will demonstrate on three case studies that resulting filters are functionally comparable with conventionally designed filters. The main contribution of this paper is that it shows that evolved filters are areaefficient in comparison with solutions based on multiplexing of ordinary filters.
II. MULTIFUNCTIONAL CIRCUITS
Logic function of multifunctional gates can be selected by various means, including external logic signals (such as the selector in multiplexers) or external voltage signals. When the control variable is V dd or temperature then there is no additional wire needed. For instance, a 6-transistor NAND/NOR gate controlled by V dd was fabricated in a 0.5-micron HP technology [4] . Another NAND/NOR gate controlled by V dd was utilized in the REPOMO32 chip [11] . A graphene gate which is configured by external voltages is capable of performing 8 different functions in its basic mode [2] .
Theoretical works on polymorphic networks such as the completeness theory can be found in [12] , [13] . Paper [5] surveys the methods proposed to design multifunctional circuits.
Polymorphic multiplexing is the most straightforward design approach. It employs a polymorphic multiplexer -a component which propagates signal A in the first mode of multifunctional gates and signal B in the second mode of multifunctional gates 2 (Fig. 2) . Consider that a target polymorphic circuit has to implement F 1 and F 2 . A conventional approach is used to synthesize a circuit implementing F 1 and another circuit implementing F 2 independently. The outputs of the circuits are then multiplexed using polymorphic multiplexers. In order to reduce the number of gates, the goal of synthesis can be to maximize the number of gates that are shared by both circuits and minimize the number of outputs that have to be equipped with polymorphic multiplexers.
CGP is capable of evolving very area-efficient multifunctional circuits, however, the approach evaluating 2 n input assignments for n-input circuits is not scalable [14] , [5] , [15] . Only small polymorphic FIR filters were designed using this method [16] . In order to overcome this very time consuming evaluation, we will apply a functional equivalence checking algorithm in Section VI as suggested in [6] . 
III. CONVENTIONAL AND MULTIFUNCTIONAL IMAGE FILTERS
The target application for this work is a multifunctional image filter operating with a 3 × 3-pixel kernel. Every image filter is considered as a digital circuit of nine 8-bit inputs (the 3 × 3-pixel kernel) and a single 8-bit output, which processes grayscale (8-bits/pixel) images. CGP is employed to devise a filter composed of ordinary and multifunctional components capable of suppressing a given type of noise in the first mode and another type of noise in the second mode. In addition to noise filtering (in particular, shot noise and Gaussian noise elimination), edge detection, dilatation and erosion are other target functions.
There are well-established conventional methods allowing us to suppress a given type of noise. The shot noise (also called the salt-and-pepper noise) is usually suppressed by a (nonlinear) median filter which calculates the median value from the nine input pixels. Fig. 3 shows the area-optimal (pipeline) implementation which consists of compare-andswap (CS) components and registers (D) [17] . The CS component calculates the minimum and maximum out of the two input values. The Gaussian noise elimination is based on a linear convolution -a simple averaging filter is shown in Fig. 4 . In case of edge detection, Sobel detector will be used in our case study. An ideal dilatation filter calculates the maximum out of the input pixels and similarly, an ideal erosion filter calculates the minimum out of the input pixels [18] .
A straightforward implementation of a multifunctional filter will multiplex an ordinary circuit created to eliminate the first type of noise and another ordinary circuit created to eliminate the second type of noise. As structures of both filters are usually completely different, the final cost (area) is expected to be roughly a sum of the areas required for both filters
IV. CIRCUIT EVOLUTION USING CGP
Functional-level as well as gate-level evolution of multifunctional circuits will be performed by CGP which has been applied for evolution and optimization of combinational circuits (i.e., acyclic directed graphs) for more than 10 years [19] , [20] , [21] .
To model a generic combinational circuit, a candidate solution is represented as an array of n c (columns) × n r (rows) of 2-input processing elements. All candidate circuits have n i primary inputs and n o primary outputs. Every processing element can be connected either to the output of an element placed in previous L columns or to one of the primary inputs. A processing element can perform either a single function (then it is not a multifunctional element) or two functions. In the second case, the processing element is considered as multifunctional providing that the first function is activated in the first mode and the second function is activated in the second mode. The set of available functions is denoted Γ. Table I shows typical 8-bit functions for evolution of image filters.
The chromosome is a list of integers starting with the value of c if a constant-outputting function is included to Γ. Then, it contains n c × n r triplets, each of them encoding a single processing element (input1, input2, function code). The primary inputs are addressed by 0, 1 . . . n i − 1 and processing elements by n i . . . n c n r +n o −1. The last n o integers define the primary outputs of the circuit. An example of chromosome and a corresponding circuit is given in Fig. 5 . Polymorphic control is modeled by Boolean signal s. In the first mode (s = 0), the functions shown in the upper part of boxes are active. In the second mode (s = 1), the functions of the bottom part are active. Notice that some elements (13 and 16) are not utilized.
CGP usually employs a simple (1 + λ) evolution strategy to search in the space of candidate circuits [19] . The initial Function-level CGP supporting multifunctional elements: (2), min/max(3), mean/min(4), and/min(5)}. Chromosome: 5,7,3; 3,1,0; 9,4,4; 8,10, 5; 11. Function codes are typed bold. population is randomly generated or seeded using conventional designs. Then, it is evaluated and the best-scored individual is considered as the parent for a new population. However, as a new parent an offspring is always chosen if it is equally as fit or has better fitness than the parent. CGP uses a mutation operator to create λ offspring of the parent to fill the new population. The mutation randomly picks µ integers and replaces them by randomly generated (but legal) values. The evolution is terminated after producing g generations.
Fitness function is application specific. In some cases, all possible input assignments are generated and resulting vectors are compared with requested vectors. The fitness value is then the number of correctly calculated bits. In other cases, only a subset of input vectors is utilized or specific procedures such as simulation of electronic properties [22] or functional equivalence checking [20] are applied. Multifunctional circuits have to be evaluated in all supported modes of operation.
We will present particular setting of CGP parameters and application-specific fitness functions in Sections V and VI.
V. EVOLUTION OF MULTI-FUNCTION FILTERS AT RTL

A. CGP Setting and Fitness Function
All candidate circuits have nine 8-bit primary inputs and one 8-bit primary output, i.e. n i = 9, n o = 1. Processing elements accept two 8-bit inputs and provide a single 8-bit output. Multifunctional elements are composed of two functions taken from Table I . A subset of functions used in a particular experiment will be denoted R. Figure 5 shows an example of a candidate circuit and its encoding.
Similarly to evolution of single-purpose filters, the original uncorrupted (training) image is needed to determine the fitness value. The goal of CGP is to find a circuit minimizing the difference between the original image and the output of the filter in both modes. The quality of filtering is expressed as the mean absolute error per pixel (mdpp):
(1) The ideal image which we are attempting to reach in the first mode (second mode) is denoted by C 1 (C 2 ). The filtered image which was obtained in the first mode (second mode) is denoted by B 1 (B 2 ). Finally, N × N denotes the size of image. Figure 6 shows the ideal version of the training image. Note that the functionality is the only objective applied in this 60 paper. The circuit cost is controlled only indirectly by setting the maximum number of processing elements (n r · n c ).
CGP array size
Experiments were performed for three target multifunctional filters. We started with the following setting of CGP parameters: λ = 4; µ = 1 integer; the number of generations g =20,000; the number of independent runs r = 40; all possible functions and their combinations from Table I were allowed in R. We compared mdpp for different setting of n c , n r and L. In the following subsections, results are given for the training image.
B. Case study 1: Dilatation/Erosion
In order to test whether the basic version of the proposed method works, we firstly evolved a simple dilatation/erosion filter. It is a very good candidate for multifunctional approach since an ideal dilatation filter calculates the maximum out of the input pixels and similarly, an ideal erosion filter calculates the minimum out of the input pixels [18] . Figure 7 shows the best evolved filter which is very close to the expected solution. Table II gives the best mdpp (the best run) and the average mdpp (the average out of 40 runs) in both modes for six different CGP configurations. The best solution was obtained for the smallest CGP array. 
C. Case study 2: Edges/Shots
The second objective was to evolve a filter performing edge detection in the first mode and salt and pepper noise (5%) elimination in the second mode (the Edges/Shots filter, for short). In order to create a training image for edge detection, we applied Sobel detector on the ideal image and used the resulting image as the target for evolution. As the edge detector (see e.g. [18] ) and shot noise filter (Fig. 3) have very different structures, we expected that obtained circuits will be more complex than the dilatation/erosion filter.
We can see from Table III that while the best mdpp was obtained for the largest CGP array, the average mdpp is minimal for the smallest CGP array. Hence we performed additional experiments with the 6×6 array where we modified the mutation rate, reduced the function set and increased the number of generations. Table IV shows that significantly better results were obtained for the reduced function set and higher mutation rate. The best-performing filter is shown in Figure 8 . Its performance is demonstrated in Fig 9. 
D. Case study 3: Gauss/Shots
The third multifunctional filter we evolved is capable of suppressing the Gaussian noise (σ = 0.1 for normalized inputs 0,0; 1,0 ) in the first mode and the shot noise (the 5% salt and pepper noise) in the second mode (the Gauss/Shots filter, in short). On the basis of previous experiments, we modified the CGP setting to µ = 2, r = 20, g = 40000. A comparison of results obtained for six CGP configurations is given in Table V . Additional results reported in Table VI indicate that restricted function set, L = 4, n c = 7 and n r = 4 give the best minimum as well as the average mdpp.
The best-evolved filter is shown in Fig. 10 . While the mean function is the most frequent one used in the first mode, the functionality of the second mode is based on computing the minimum and maximum. We expected this type of function utilization. 
E. Filtering Quality and Design Time 1) Filtering Quality:
The best filters that we presented in previous subsections were compared in both modes with conventional filters. Table VII gives the average mdpp obtained using a set of 16 test images shown in [8] . The evolved salt-and-pepper filters exhibit lower mdpp with respect to the median filter (Fig. 3) . The average mdpp of the evolved filter for Gaussian noise is slightly worse than the mddp obtained for the conventional averaging filter. Results are not given for dilatation, erosion and edge detection because corresponding conventional operators were directly utilized to create training images (i.e. mdpp is 0.0).
2) Time of Evolution:
We measured the time of evolution for the dilatation/erosion filter (λ = 4, g = 500, µ = 1, L = 2, n c = n r = 4) using a laptop equipped with the Pentium M 1.8 GHz processor. Figure 11 shows that the time of evolution significantly depends on the training image size. A typical experiment utilizing a 256×256-pixel image and running for 20,000 generations then took approx. 48 min.
VI. GATE-LEVEL OPTIMIZATION OF PROCESSING ELEMENTS FOR MULTI-FUNCTION FILTERS
In this section, we will explore possible gate-level implementations of the 16-bit input (2 x 8 bits) and 8-bit output processing elements (such as min/max) that represent building blocks of evolved filters shown in Fig. 7, 8 and 10 . Resulting implementations of evolved filters will be then compared with conventional solutions based on multiplexing of the common Table I. filters (e.g. median filter and Sobel filter) as presented in Section III.
A. Multiplexer-based method
Figure 12 (left) shows the multiplexer-based approach proposed to implement the multifunctional processing elements. The construction has three steps.
(1) Elementary functions from Table I are described in VHDL and synthesized using LeonardoSpectrum (LS) which can utilize the gates from Table VIII (except NAND/NOR). The implementation cost of resulting gate-level circuits is expressed as a relative area and in absolute gate numbers in Table I . (2) Two elementary functions (E1 and E2 in Fig. 12 ) are then connected using standard multiplexers in order to obtain a single multifunctional processing element controlled by selector s.
(3) Implementations of evolved filters (Fig. 7, 8 and 10 ) are composed of the processing elements created in step (2) . A particular filter selection depends on setting of the selector s. The final implementation cost of the filters is given by the CoMux column in Table IX . The gate-level netlists of processing elements can further be optimized, e.g. by conventional tools such as ABC [23] . The cost of filters composed of the processing elements that were optimized using ABC is given by the CoMux-ABC column in Table IX .
B. Polymorphic multiplexer-based method
The idea of polymorphic multiplexing of elementary functions in the processing element is shown is Fig. 12 (right) . The first step of construction -the gate level design of elementary functions -is the same as in the previous procedure. Then two elementary functions (E1 and E2) are connected using polymorphic multiplexers in order to obtain a single multifunctional processing element. Because every polymorphic multiplexer contains two polymorphic gates (Fig. 2) each multifunctional processing element contains 16 polymorphic gates. The application of this basic construction procedure leads to multifunctional filters whose implementation cost is given by the Polymux column in Table IX. The gate-level netlists of the processing elements can further be optimized. However, as neither ABC nor LS support polymorphic gates, we can optimize only the implementations of interconnected elementary functions E1 and E2 in the processing element. Table X (the seed columns), shows the number of gates (including polymorphic gates) in the processing elements obtained after an optimization conducted using ABC and LS respectively. It can be seen that ABC and LS provides very similar results in average. The optimized implementations of the processing elements were utilized as building blocks for evolved filters. Table IX, the Polymux-ABC column, shows that a small improvement has been obtained in all three cases w.r.t. the Polymux approach.
C. Postsynthesis optimization using CGP
In order to further optimize the multifunctional processing elements, we considered the resulting processing elements that were obtained in the previous section as a seed for the CGP circuit optimizer. The CGP array contains n c × 1 nodes, where n c is the number of gates in the seed circuit, n i = 16, n o = 8, λ = 1, l = n c and µ = 2. The values of parameters are chosen according to [6] . The set of functions includes ordinary two-input logic functions, buffer, inverter and the NAND/NOR gate, i.e. Γ = {BUF, NOT, AND, OR, XOR, NAND, NOR, NAND/NOR, ZERO, ONE}. Similarly to [5] , we have used just one polymorphic function NAND/NOR.
1) Fitness Function:
A straightforward approach to the evaluation of a candidate polymorphic combinational circuit requires applying 2 ni assignments to the inputs, calculating the number of correctly produced bits for the first mode and repeating these two steps for the second mode. This is very time consuming for our target 16-input/8-output multifunctional circuits. In order to accelerate the evaluation, a SATbased equivalence checking algorithm is applied [6] . This algorithm assumes that CGP is seeded using a fully functional polymorphic circuit. The seed is utilized as a reference solution for the SAT-based equivalence checking algorithm.
The fitness evaluation works as follows. The reference circuit U as well as candidate circuit V (which is created by mutation from the parent) is set to the first mode. A new auxiliary circuit W 1 is composed of the reference circuit in the first mode (circuit U 1 ), the candidate circuit in the first mode (circuit V 1 ) and a miter (a set of XOR gates followed by the OR detector), see Fig. 13 . Circuit W 1 is transformed into one Boolean formula in conjunctive normal form (CNF) which is unsatisfiable if and only if circuits U 1 and V 1 are functionally equivalent [24] . The transformation to CNF is conducted gate by gate using the Tseitin's algorithm [25] . If U 1 and V 1 are not functionally equivalent then the fitness evaluation is finished and CGP proceeds with another candidate circuit. Otherwise, circuits U and V are set to the second mode and the process is repeated. If the circuits are also functionally equivalent in the second mode then the fitness value is given by the number of gates. Otherwise, the fitness value is -1, i.e. the worst one. 2) Results: The MiniSAT 2 (version 070721) has been used as a SAT solver [26] because it can easily be embedded into a custom application. The experiments were carried out on a cluster consisting of Intel Xeon E5345 2.33 GHz processors that enables to run several experiments in parallel. Two seeds were compared -one coming form the ABC tool and another one from the LS tool. For each seed and processing element, 25 parallel runs of CGP were performed. Every 15 minutes, all runs were re-seeded using the best individual obtained so far. The evolution was stopped when no progress has been observed in last 15 minutes. The minimum, average and maximum time spend by a single run was 1.25, 1.61 and 4 hours. Table X shows the number of gates in the best circuit, the number of polymorphic gates and resulting relative area.
We can observe that the averages calculated for our set of processing elements depend on the seed chosen insignificantly.
As example, Figure 14 shows the progress of optimization of the nand/min processing element seeded using a circuit comming from the ABC tool.
The most area-efficient implementations of processing elements were then utilized to create the image filters according to Fig. 7, 8 and 10 . The Polymux ABC-CGP column in Table IX shows a significant reduction in the area in comparison with the Polymux-ABC method. 
D. Comparison with conventional filters
As proposed in Section III, the conventional implementations of filters can be multiplexed to develop multifunctional filters. Hence we synthesized gate-level implementations of the 9-input median filter, 9-input averaging filter (9-Mean), 9-input dilatation and 9-input erosion filter. Subsequently, multifunctional filters were composed using multiplexers. Note that the total area of the 8-bit multiplexer is 10.64. Table XI gives the implementation cost of conventional filters and multiplexed conventional filters.
By comparison of the relative costs, it can be seen that evolved filters that were implemented using multiplexing at the level of processing elements (Table IX) occupy much smaller area than multiplexed conventional filters (Table XI) . If the CGP optimization is taken into account, polymorphic multiplexing-based implementations are comparable to implementations utilizing multiplexed processing elements. It is important to emphasize here that the implementation cost of polymorphic multiplexers is considered as relatively high and more compact implementations are expected in the future. 
VII. CONCLUSIONS
A new two-step method for design of multifunctional image filters has been proposed in this paper. In the first step, we applied CGP to evolve image filters composed of multifunctional processing elements operating over 8 bits. This approach extended our previous work on image filter evolution. It was shown that CGP can fit two different filtering tasks to a single acyclic directed graph and the resulting filters exhibit desired filtering quality. In the second step, we tested several methods to implement processing elements involved in evolved filters in order to minimize the area on a chip. Simple multiplexing proved to be a very efficient method to accomplish this task. By combining of CGP with SAT-based fitness function, we were able to find interesting solutions for circuits with polymorphic gates even when a very pessimistic implementation cost was assumed for polymorphic multiplexers.
VIII. ACKNOWLEDGMENTS
