Abstract-The paper deals with the evolutionary post synthesis optimization of complex combinational circuits with the aim of reducing the area on a chip as much as possible. In order to optimize complex circuits, Cartesian Genetic Programming (CGP) is employed where the fitness function is based on a formal equivalence checking algorithm rather than evaluating all possible input assignments. The standard selection strategy of CGP is modified to be more explorative and so agile in very rugged fitness landscapes. It was shown on the LGSynth93 benchmark circuits that the modified selection strategy leads to more compact circuits in roughly 50% cases. The average area improvement is 24% with respect to the results of conventional synthesis. Delay of optimized circuits was also analyzed.
I. INTRODUCTION
Cartesian Genetic Programming (CGP) can be considered as one of the most efficient methods for evolutionary design and optimization of digital combinational circuits [1] , [2] . Miller and his collaborators showed more than 10 years ago that small combinational circuits such as 4-bit multipliers can be designed in a new way, often saving around 20% gates in comparison with the state-of-the art conventional synthesis tools. This seminal work has been extended in several ways. The most recent research in the area of digital circuits includes the evolution of novel standard cell libraries for future tech nology nodes [3] and a post synthesis optimization of complex combinational circuits using formal verification principles [4] . While the achievable quality of resulting design/optimization is usually very high, the computational time required to achieve that result is the main drawback.
The fitness computation is the most time consuming proce dure. The basic method testing 2n input vectors for n-input circuit is not scalable. Hence, good results were produced for combinational circuits with only 10-20 inputs (depending on a particular instance) [1] , [2] , [5] , [6] , [7] , [8] . Most work in this area considers the number of gates as the only criterion for optimization. Delay, area on a chip, power consumption, testability and other important properties are not usually addressed. Multiobjective optimization has been applied for very small problem instances only [9] .
In this work, we will follow the path of the single-criterion evolutionary synthesis with the aim of area minimization because it is still the most important criterion for many applications. Moreover, this is a classic subtask in the logic optimization research where delay and other criteria are not always considered [10] . The goal of this work is to obtain as U.S. Government work not protected by U.S. copyright small phenotypes (resulting circuits) as possible for complex combinational circuits. In order to reduce the fitness com putation time, we will replace the standard approach, which applies all possible input combinations to determine the fitness value of a candidate circuit by a formal equivalence checking algorithm, which is able to relatively quickly check whether a candidate solution is functionally correct even for most large scale circuit instances (hundreds of inputs) [11] .
In this context, two selection strategies will be compared. We will show that the selection of a parent on basis of its functionality solely instead of compactness (the area on the chip) can lead to small phenotypes at the end of evolution for many of our benchmark problems. This behavior has already been observed during evolution of small combinational circuits [12] . In order to compare and evaluate the role of selection mechanisms, we propose to measure the number of nondestructive mutations during evolution. It is supposed that the modified selection will generate more functionally correct individuals than the standard selection of CGP.
In contrast to most works on this topic, the optimization criterion will be the estimated area on a chip instead of the number of gates. This approach will allow us to fairly compare our results with the state-of-the-art synthesis tools such as ABC [l3] or SIS [14] . Another issue, which is not usually addressed by the evolvable hardware community, is to what extent the delay of a combinational circuit is modified when the circuit is optimized for the area only. The evaluation of proposed methods will be performed on a cluster of workstations and using the LGSynth93 benchmark circuits.
The rest of the paper is organized as follows. Section II presents CGP as a method for digital circuit evolution and optimization. The modified selection mechanism for CGP is introduced in Section III. We applied a fitness calcula tion based on formal equivalence checking. The method is described in Section IV. Our proposal for combining the modified selection strategy and equivalence checking-based fitness function is presented in Section V. The results of experiments are summarized in Section VI. Section VII deals with discussion of obtained results and analysis of results on the basis of the non-destructive mutations measurement. Finally, conclusions are given in Section VIII. 
II. CIRCUIT EVOLUTION US ING CGP
Cartesian Genetic Programming is a subarea of genetic programming where candidate individuals are represented as directed acyclic graphs, a mutation is considered as the main genetic operator and evolution is carried out using the 1 + A evolution strategy [15] , [1] .
A. CGP
From the perspective of circuit design, a candidate circuit is represented as an array of nc (columns) x nr (rows) of programmable gates. The number of inputs, ni, and outputs, no, is fixed. Each gate can be connected either to the output of a gate placed in previous I columns or to one of the circuit inputs. Setting of the I parameter allows to control the maximum circuit delay. Feedback is not allowed. Each gate is programmed to perform one of na-input functions defined in the set r (nf denotes IfI). Each gate is encoded using na + 1 integers where values 1 ... na are the indexes of the input connections and the last value is the function code. Every circuit is encoded using nc. nr. (na + 1) + no integers. Figure 1 shows an example of a candidate circuit and its chromosome.
CGP operates with population of 1 +A individuals (typically, A is between 1 and 20) . The initial population is constructed either randomly or by a heuristic procedure. Every new popu lation consists of the best individual of the previous population and its A offspring individuals. The offspring individuals are created using a point mutation operator which modifies h randomly selected genes of the chromosome, where h is a user-defined value.
An important rule for selection of the parent is utilized. In case when two or more individuals can serve as the parent, an individual which has not served as the parent in the previous generation will be selected as the new parent. This strategy is important because it ensures a diversity of population [16] . The algorithm is terminated when the maximum number of generations is exhausted or a sufficiently working solution is obtained.
B. Fitness Function
In case of digital circuit evolution, the fitness value of a candidate circuit is defined as [1 7] :
where b is the number of correct output bits obtained as response for all possible assignments to the inputs, z denotes the number of gates utilized in a particular candidate circuit and nc.nr is the total number of available gates. It can be seen that the last term ncnr -z is considered only if the circuit behavior is perfect, i.e. b = bmax = no2n ; . Alternatively, we can replace the number of utilized gates by the number of utilized transistors which is a more precise measure as implementation costs of gates are different [18] . We can observe that the evolution has to discover a perfectly working solution firstly while the size of circuit is not important. Then, the number of gates is optimized. Similarly, delay or power consumption may be optimized.
Although many new designs have been discovered using the standard CGP, the method is not applicable for design or optimization of large circuits because of the time consuming fitness evaluation. A multi-objective formulation of circuit evolution problem was also proposed, but evaluated using small benchmark problems only [9] .
III. SELECTION MECHANISMS IN CGP
In the proposed modification of the selection mechanism, there is only one requirement for selection of the parent individual in case that a functionally correct circuit has already been discovered: the parent must be fully functional. Note that in the standard CGP, the parent is always the best circuit discovered so far, i.e. functionality as well as size are considered. The selection procedure can be formalized as follows.
Let p denote the highest-scored individual with the fitness value fp. The new selection strategy is proposed only for situation when the number of gates is optimized, i.e. the fitness value of the best individual is higher than or equal to bmax. Otherwise, the algorithm works as the standard CGP.
As the best individual found so far will not be copied to the new population automatically, it is necessary to store it in an auxiliary variable. Let (3 denote the best discovered solution and let f (3 be its fitness value. In the first population, (3 is initialized using p .
Assume that Xl ... x.\ are individuals (with fitness values fXl ... fx;,) created from the parental solution p using the mutation operator and f(3 2: bmax. Because the best individual (3 and parental individual p are not always identical we have to determine their new instances (3' and p i separately. The best-discovered solution is defined as:
where Xj is the highest-scored individual for which fXj > f(3 holds. If multiple individuals exist in {Xl ... x.\ } that satisfy the previous condition then one of them is randomly chosen.
The new parental individual is defined as:
where Xj is a randomly selected individual from those in {Xl ... x.\ } which obtained the fitness score higher than or equal to bmax.
In other words, the new parent must be a fully functional circuit; however, the number of gates is not important for its selection. Note that the result of evolution is no longer p but (3. Paper [12] showed that the modified selection strategy leads to smaller circuits than the strategy used in the standard CGP for majority of tested circuits; however, only small problem instances have been studied so far.
IV. FAST FIT NESS EV ALUATION
Since there are many conventional circuit design methods, we can easily obtain a (typically non-optimal) implementation of target circuit and use it to seed the initial population for CGP. Despite the fact that the conventional solution might bias the search algorithm to some subareas of the search space there are some benefits, namely the design time reduction in comparison to a search from scratch. It is important for us that the conventional solution can also be utilized as a reference solution for an equivalence checking algorithm. Instead of applying all possible assignments to the inputs (see eq. 1) every new candidate circuit is compared with the reference circuit in order to determine its functionality. Functionally incorrect candidate circuits are discarded. In case that a candidate circuit is fully functional then its fitness value is given by the number of gates or transistors. Determining whether two Boolean functions are function ally equivalent represents a fundamental problem in formal verification. Although the functional equivalence checking is an NP-complete problem, several approaches have been proposed so far to reduce the computational requirement for practical circuit instances.
Most of proposed techniques are based on representing a circuit by means of its canonical representation. Generally, two Boolean functions are equivalent if and only if canon ical representations of their output functions are equivalent. The Reduced Ordered Binary Decision Diagrams (ROBDD) represent a widely used canonical representation in formal verification [19] . Some of methods developed to determine whether two ROBDDs are isomorphic are based on graph based algorithms. Other methods are based on the combination of ROBDDs with the XOR operation and checking whether the resulting ROBDD is a constant node (zero). And-or-invert graphs represent another canonic representation with similar properties. All these graph-based approaches rely on the fact, that the number of nodes in the resulting graph will be relative small, otherwise, the time of the ROBDD construction as well as the time of comparison will be enormous. In practice, these methods are rarely implemented directly without any further circuit preprocessing. High consumption of memory resources has motivated researchers to look for alternative methods. Since the satisfiability (SAT) solvers were significantly im proved during the last few years, the SAT-based equivalence checking becomes to be a promising alternative to the BDD based checking.
B. SAT-based Equivalence Checking
A SAT-based equivalence checking was applied to CGP in [4] , [11] . As Figure 2 shows the circuits to be checked are compared using a set of XOR gates followed by the OR detec tor (so-called miter). In order to disprove the equivalence, it is necessary to identify at least one XOR-gate which evaluates to 1 for an input assignment, i.e. it is necessary to find an input assignment for which the corresponding outputs Yi and Y � provide different values and thus Zi = 1. The resulting circuit, which is composed of Circuit A (reference circuit), Circuit B (candidate circuit) and the miter, is transformed into one Boolean formula in conjunctive normal form (CNF) which is unsatisfiable if and only if circuits A and B are functionally equivalent [20] . The transformation to CNF is conducted gate by gate using the Tseitin's algorithm [21] . Table I contains the CNF representation for selected gates.
C. Optimized Approach for CGP
Although the SAT-based equivalence checking applied in the fitness function allows us to optimize large logic circuits using CGP, there exist circuits for which the runtime of the state-of the-art SAT solvers grows exponentially with increasing the size of problem instance. In order to shorten the decision time, various methods can be applied to reduce the number of clauses for the SAT solver. We proposed to utilize knowledge of genes which have been modified by the mutation operator to calculate a 'difference' between the parent individual and its offspring [11] . Note that this 'difference' circuit is sufficient for checking the functional equivalence of parent circuit and its offspring and thus only the 'difference' is submitted to the SAT solver. An example of a reference (parent) circuit and modified circuit (offspring) is shown in Fig. 3a ,b. The 'difference' circuit ( Fig. 3c) consists of 8 gates (7 + 1 XOR). This is a significant reduction with respect to the standard all output approach which led to l7 gates (14 + 2 XOR + l OR). Resulting CNF is shown in Fig. 3d .
V. EXPERIMENTS
The modified selection strategy will be combined with the fast equivalence checking-based fitness function to optimize complex combinational circuits. In order to investigate whether 
the approach is useful, we will compare two CGP-based optimization methods.
• Method A utilizes standard CGP and a fast SAT-based fitness function.
• Method B utilizes CGP with modified selection strategy and a fast SAT-based fitness function.
The CGP parameters are identical for both methods: Note that ABC and SIS tools are deterministic. We have used them with the standard setting and the aim to minimize the area. In order to improve their results we applied them on their own results iteratively. Implementation cost (i.e. area) of a circuit is calculated as a weighted sum of gates that are utilized in a candidate circuit. The weights which are given in Table II reflect real sizes of corresponding transistor-level implementations. The MiniSAT 2 (version 070721) has been used as a SAT solver [22] because it can easily be embedded into a custom application.
The experiments were carried out on a cluster consisting of Intel Xeon X5670 2.4 GHz processors using the Sun Grid Engine (SGE) that enables to run the experiments in parallel. All statistical results were calculated from 50 independent 12-hour runs. 
VI. RESULTS

A. Setting of CGP parameters
In order to find suitable values for CGP parameters, we have performed initial experiments for selected benchmark circuits. We evaluated the effect of various combinations of the population size (),), mutation intensity (h) and l-back parameter. Table III gives relative improvement (in %) of the initial circuit that was achieved using Method A and B. We can observe that increasing of mutation intensity leads to increasing of circuit area which is not desired. Reducing the population size and modifying the l-back parameter influences the quality of optimization insignificantly. This observation holds for Method A as well as Method B. Therefore, the following setting seems to be the most suitable: ), = 2, l = nc = Ng, h = 2, and nr = 1. Table IV gives basic parameters of 21 LGSynth93 bench mark circuits. It should be noticed that only circuits with 10 and more inputs were considered for our experiments. For each circuit the number of inputs and outputs is given. The 'Area' column shows the weighted number of gates (relative area) of an initial solution (a CGP seed), i.e. the best result of 100 iterative applications of a conventional synthesis conducted using ABC. Delay is also reported. The number of evaluations (the Eval column) allowed for CGP has resulted from a particular circuit size and a total optimization time available. Table V and Table VI summarize the results for Method A and Method B in terms of:
• BstF -the best fitness (i.e. the smallest relative area).
• BstF D -delay of the best obtained circuit.
• BstD -the shortest delay obtained.
• BstD F -the area for the circuit with the shortest delay • AvgF -the average fitness (with a standard deviation) calculated out of 50 independent 12-hour runs.
• AvgDly -the average delay • NDM -the percentage of non-destructive mutations. Figure 4 shows 12-hour convergence curves for 8 selected circuits that were optimized by Method A and B. The starting point is always the best result of ABC. We can see from Tables V and VI that Method B gives more compact circuits in 11 out of 21 optimized cases. Both methods significantly outperform the conventional ABC tool; however, computational requirements are very different. While iterative application of deterministic ABC quickly leads (in a few seconds/minutes) to a small reduction of circuit size, no further improvements have been observed in next 1 hour. The progress of CGP optimization continues for a longer time. For the cost of a runtime, the results of conventional synthesis were significantly improved for the LGSynth93 benchmarks at the end of optimization (area reduced by 24% on average). We can also observe that all runs produced a very similar result (standard deviation is relatively low) for a particular circuit which is good for practice. Method B could be recommended for optimization of smaller circuits (in terms of area; the number of inputs is not important).
The most remarkable result has been obtained for the alu4 circuit where the relative area was reduced to 106.7 by Method A and 70.8 by Method B from the original value of 819.7. This result confirms that the conventional optimization generates far from optimum solutions for some types of circuits (see the same point in [23] , [24] , [25] ). Hence it is worth to perform this time consuming CGP-based optimization. Note that the results are not directly comparable to our previous work since we measured a relative area in this paper while papers [25] , [11] give the number of gates.
In contrast to our assumptions, the significant reduction in the area is not automatically accompanied by a reduction of delay. It can be calculated that delay of the most area efficient circuits (see the BstF columns) increased by 3.00 (5.19, respectively) for Method A (Method B, respectively) in average in comparison with the CGP seeds. Moreover, the circuits showing the shortest delay from all the independent runs exhibit higher delays than the CGP seeds. The average increase is 1.38 for Method A and 2.48 for Method R Method A provides more stable results in terms of delay than Method R The reason is that Method B samples more new candidate (however, fully functional) circuits and hence the probability that delay of the offspring is different w.r.t its parent is higher. The role of selection seems to be crucial in this task.
B. Analysis of Selection
Although Method B does not consider the circuit size for selection of a parent individual it still provides better results than Method A for many problem instances. Recall that a fitness landscape is rugged and neutral in case of digital circuit evolution using CGP [26] . In the standard CGP, generating of offspring individuals is biased to the best individual that has been discovered so far. The best individual is changed only if a better or equally-scored solution is discovered. In Method B, the changes of the parent individual are more frequent because the only requirement for a candidate individual to qualify as a parent is its functional correctness. Hence, we consider Method B as more explorative than Method A.
Paper [l2l suggests that if a high degree of redundancy is present in the genotype, the modified selection strategy of Method B would generate more functionally correct in dividuals than Method A. And because the fitness landscape is rugged and neutral, Method B would be more efficient in finding compact circuit implementations than Method A. In order to verify this hypothesis for our case (where large circuits are optimized in contrast to small circuits in paper [12]), we measured the number of non-destructive mutations (NDMs), i.e. neutral-to-fitness and fitness-improving mutations.
Tables V and VI show the average percentage which is taken by NDMs (calculated from all runs). Method B produces 6.2% more NDMs than Method A. If Method B gives a smaller circuit than Method A it holds in 10 out of 11 cases that percentage of NDMs is higher for Method B. On the other hand, if Method A gives smaller circuits than Method B then percentage of NDMs is higher for Method A only in 5 out of 10 cases.
VIII. CONCLUSIONS
We evaluated several aspects of the CGP based evolutionary post-synthesis optimization of complex combinational circuits. In particular, we utilized an equivalence checking algorithm in the fitness function to reduce the fitness computation time. In addition, two selection strategies were compared with the aim of reducing the area (the weighted number of gates) in the LGSynth93 benchmark circuits. It was shown that the modified selection strategy leads to more compact circuits in roughly 50% cases in comparison with the original selection strategy of CGP. The average area improvement is 24% with respect to the results of conventional synthesis. The main drawback of the proposed methods is that delay of optimized circuits has increased in many cases. Our plan for future work is to integrate a truly multiobjective optimization engine to the proposed methods.
IX. ACK NOWLEDGMENTS
This work was supported by the Czech science foun dation project PI03/10/151 7, the research programme MSM 0021630528, the BUT project FIT-S-ll-l and the IT4Innovations Centre of Excellence CZ. 1.05/1. 1.00102.0070.
