Abstract. This paper presents a methodology to design optimized electronic digital systems from high abstraction level descriptions. The methodology uses Genetic Programming in addition to high-level synthesis tools to automatically improve design structural quality (area measure). A two-stage, multiobjective optimization algorithm is used to search for circuits with the desired functionality subjected additionally to chip area constraints. Experiment with a square-root approximation datapath design targeted to FPGA exemplifies the proposed methodology.
Introduction
cope with this problem most behavioral synthesis tools constrain the design space by assuming a "target architecture" to which all high level specifications are mapped. The result is a reasonable average solution locally optimized for each design problem.
On the other hand, layout synthesis and hardware implementations are considerably simplified by FPGA (Field Programmable Gate Arrays) prototyping. By eliminating the need to cycle through an IC production facility, both time-to-market and financial risk can be substantially reduced. Recent statistics [8] indicate that, today, approximately one half of all chip designs are started using FPGAs. This trend is followed in modern synthesis tools that offer a smooth migration path between FPGA and ASIC technologies, providing direct mapping of an FPGA design into ASIC libraries [9] .
This work presents a digital IC design methodology based on Evolvable Hardware (EHW) techniques [10] , [11] . Genetic Programming is used to automatically explore the design space and improve the design quality. To accomplish this, a multi-criteria fitness function was defined to rank candidate solutions according to their area sizes. An experiment with a square-root approximation datapath design targeted to FPGA is detailed. Simulation results confirm the automatic generation of optimized structures starting from a high level specification.
The paper is organized as follows: Section 2 presents related works in order to establish the context in which the proposed methodology is applied. In Section 3 the Evolutionary Computation and Genetic Programming concepts of interest to the proposed methodology, are discussed. The design methodology is detailed in Section 4 and the example of a datapath design is presented in Section 5. Conclusions are drawn in Section 6.
Related Works
Evolutionary Algorithms have already been used in lower levels of electronic design, including routing, partitioning and placement [1] , [12] , and more recently in the automation of the structural design of digital circuits, as well as in the search for quality implementations, as related below.
Recent works [13] - [16] present evolved structures entirely in software using computer simulations (extrinsic evolution) to evaluate intermediate solutions. In [13] an algorithm capable of evolving 100% functional arithmetic circuits is presented. Based on a rectangular array of uncommitted logic cells of FPGAs, the algorithm is able to re-discover conventional optimum designs for one and two-bit adders, eventually improving the gate count of human produced designs of the two-bit multiplier.
Kalganova & Miller [14] addressed a two-stage multiobjective fitness function. In the first stage a 100% functional solution, which occurs when the truth table is matched, is sought. In the second stage the number of gates actually used in each candidate solution is taken into account in the fitness function. This allows circuits to evolve with the desired functionality minimizing their number of active gates. The authors limited their focus to combinational logic circuits, containing no memory elements and no feedback paths. A similar multiobjective optimization technique was proposed by Coello et al. [15] , the main difference being the use of a populationbased technique to split the search task among several sub-populations. In this approach each objective is assigned to a small sub-population, which is merged with the rest of the individuals when one of the objectives is satisfied. The merged populations contribute to minimize the total amount of mismatches produced between the encoded circuits and the truth table.
Hounsell & Arslan [16] addressed an EHW environment called Virtual Chip in which timing constraints are taken into account. In this environment not only gates but also functional macros, such as half-adders and compound gates, are available to the evolutionary process. In order to speed up the overall process a single VHDL program describing all candidate circuits is used in the simulations. In this work a phased approach was introduced to ease the evolution of complex systems. The phased evolution consists of evolving a circuit in stages: the system initially evolves a "sub-circuit" for each output of the desired circuit; in the sequence, redundant logic between the evolved sub-circuits is removed as they are combined to generate the required circuit. Using this approach a 3-bit multiplier was evolved with results as good as CAD based circuits.
An alternative to the above approaches is the use of formal grammars to represent the evolution of hardware designs, as in the work of Hemmi et al. [17] . In this approach, hardware specifications, which produce not only hardware structures but also behaviors, are automatically generated as HDL programs. To create HDLdescriptions the authors used Genetic Programming (GP) to evolve trees from productions (rephrased rules in the HDL grammar). The fitness is evaluated using simulation tools and hardware implementation is created after the learning task is completed. This system was used to automatically generate a sequential binary adder.
The approach to be presented in this paper uses the main techniques reported above, together with new improvements. Similar to [17] , formal grammar of an HDL language was used to build candidate solutions. However, the place-and-route task was placed inside the evolutionary process in order to supply the fitness function with chip area information. The multiobjective fitness function adopted closely parallels the two-stage approach suggested in [14] , although applied in a different framework. Also, an experience was made using a single VHDL program, as suggested in [16] , in order to reduce computing effort.
Evolutionary Computation and Genetic Programming
Evolutionary Computation (EC) consists of a class of probabilistic algorithms inspired by the principles of Darwinian natural selection and variations that can greatly benefit from the increased simulation power for complex design, control and knowledge discovery applications [18] . Genetic Algorithm (GA) and Genetic Programming (GP) are the instances of EC most widely used in Evolvable Hardware [19] .
GA [20] is based on the evolutionary process occurring in nature where a population is shaped by the survival of best-fit individuals. GA usually works with strings of fixed length in which one or more parameters are coded. GP is a branch of GA, the main difference being the solution representation. Unlike GA, GP can easily code genotypes of different sizes, which increases the capacity in structure creation.
Koza introduced canonical GP [21] with tree-based genome using LISP language. Since that, other genome types have been proposed like linear and graph genomes [22] . Other GP systems were developed to automatically create programs in arbitrary languages (other than LISP) [23] , [24] . An alternative way to encode domainknowledge is by formal grammars [25] . Programs may be produced by combining a context-free grammar (CFG) with GP [26] - [29] , known as G 3 P (Grammar-Guided Genetic Programming). G 3 P provides a framework for automatic creation of error free programs in arbitrary language. Genetic Programming Kernel (GPK) [28] was used as the core engine for the GP system used in this paper. GPK is a complex system that evolves programs in any language once a Backus-Naur form (BNF) is provided as input. However, the initialization procedure was changed to overcome GPK difficulty associated with creating the first generation, as pointed out in [29] .
Design Methodology
The proposed design methodology evolves independent VHDL programs trying to breed a solution (hardware description) that implement the desired functionality and satisfies design constraints for area. This methodology uses a mixture of C code and VHDL, with HLS tools. The execution flow of the proposed methodology is drawn in Fig. 1 . This figure shows the two main components of the methodology: the GP core and the Valuation function. Also, it can be seen that, differently to other methodologies [16] , [17] , synthesis and place-and-route are performed for each candidate solution, which gives an exact value for the area. The whole methodology is based on the use of the GP algorithm. Fig. 2 shows the flowchart of the GP system used, adapted from [21] . In this figure, the index "i" refers to an individual in the population of size M. The variable "Gen" is the number of the current generation. A population of M individuals (programs) is randomly created with the restriction of a pre-defined BNF containing a subset of VHDL grammar obeying the sufficiency property [21] . The system runs until the termination criterion is satisfied. Since the proposed methodology is concerned with unknown solutions, the termination is achieved when the number of generations exceeds a maximum pre-defined number G (generational predicate). To create VHDL source codes, the entity declaration and the architecture declarative part of the VHDL program are maintained fixed, as they are common to all individuals. The architecture statement part of the VHDL code will evolve.
The evaluation of the objective function is a key procedure of the GP system. The steps involved in this operation are depicted in Fig. 3 . After a new population is created it is necessary to complete the VHDL source code by inserting the entity declaration and the architecture declarative part of each individual. Then, the individuals are synthesized using Altera MAX+PLUS II compilation tool [30] . To effectively evaluate the fitness of an individual it is necessary to simulate each of the fitness cases and check the results against target values.
The objective function to be minimized is a multi-parameter error function combining functionality with the area parameter in a two-stage approach similar to that proposed in [14] . In this process the functional specification (F 1 fitness) must be first matched to trigger the search for an optimal area implementation (F 2 fitness). Let the average functionality error, fe av , be defined as: The fitness function F is defined as follows:
Where, te is the target functional specification error for the fitness cases, lu is the number of used logic elements (Cf. Subsection 5.3) and k is a function of lu. The k(lu) function is a weighting factor that considers the effect of the area used to implement the desired functionality. It is an exponential function of lu that privileges the individuals containing the lowest quantities of logic elements. The k(lu) function will take the form: 
Where, lu min and lu max are minimum and maximum quantities of logic elements and k min and k max are the minimum and maximum values of k(lu), respectively.
To exemplify the proposed methodology, it was selected a datapath design taken from [6] . The aim is to compute the square-root approximation (SRA) of two integers, a and b. In [6] the following approximation formula is given: 2 2 max ((0.875x+0.5y),x) a b + ≅
Where x = max (|a|, |b|), and y = min (|a|, |b|).
In this problem of symbolic regression the goal is to find a function in symbolic form that is a good, or an exact, fit to a group of 20 sets of numerical data points. The black box of the SRA is given in Fig. 4 . In this figure, the "start" input and "done" output signals are present for interface control issues. 
BNF Definition
A BNF grammar describes admissible structures of a language through a 4-tuple {S,N,T,P} where S denotes the start symbol, N the set of non-terminal symbols, T the set of terminal symbols and P the productions, i.e., rewritten rules that map the elements of N to T. The BNF that defines the syntax of the VHDL statement part of the SRA description is defined as: S ::= <fe>; <fe> ::= "PROCESS BEGIN WAIT UNTIL (clk'event and clk='1'); IF in_a>=in_b THEN x<=in_a; y<=in_b; ELSE y<=in_a; x<=in_b; END IF; t6<=" <expr> "; IF t6>=x THEN t7<=t6; ELSE t7<=x; END IF; out_z<=t7; done<='1'; END PROCESS;"; <expr>::= "(" <expr> ")" <op> "(" <expr> ")" | <var> | <var> "(7 downto " <shf_size> ")"; <op> ::= " + " | " -"; <var> ::= " x " | " y " | " w "; <shf_size> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7";
In the above BNF description the signals "x", "y", "w", "t6" and "t7" are internal signals of the VHDL architecture body. The signal "w", assigned zero, is present to make ease the insertion or removal of a terminal. max and min operations are already fixed in BNF through IF clause. The original problem was restricted to use input and output operands as unsigned type instead of integer type, reducing computing effort.
GP Parameters
The main control parameters of the GP for the first stage (F 1 fitness) of the proposed experiment are shown in Table 1 . In the second stage of the evolution (F 2 fitness) the GP parameters are the same, except for the objective and raw fitness rows of Table 1 that change to incorporate the search for a design with optimal area. Raw fitness: The average, taken over 20 fitness cases, of the absolute value of the difference between the value of the dependent variable produced by simulating synthesized VHDL program and the target value of the dependent variable (Cf. eq. (1)). Selection Method:
Fitness-proportionate. Main GP Parameters: M=100, G=51, p c =0.73, p m = 0.12, without elitism.
Results
The EPF8282A FPGA from Altera FLEX 8000 family, with up to 282 logic elements (LEs), was used as target device. Each LE consists of a look-up table (LUT), a function generator that computes any function of four variables, and a programmable register. In this experiment, area will be measured in number of LEs. The target functional specification error (te) was set to 2.5%. It establishes for each fitness case the maximum relative error between the simulated value of the dependent variable, produced by the synthesized VHDL programs, and the desired value.
A population size of 100 individuals was used, evolving up to a maximum of 51 generations. Experiments have been carried out with N=20, w i =1 and k(lu)=0.25e (0.0139*lu) , which guarantees k(lu) ∈ [0.5,1.0] for lu ∈ [50,100] (Cf. eq.(3) and eq. (4)). The initial population was created using the ramped half-and-half generative method [21] , replacing GPK initialization procedure.
The first functional individual, i.e., one that met all fitness cases evaluations with the functional specification target error of 2.5% was created at generation 18. The system started, then, the search for an implementation with optimal area. During the second stage of the evolution a variety of individuals agreeing with the specified te value emerged, although it happened with a non-optimal area use. IF in_a >= in_b THEN x <= in_a; y <= in_b; ELSE y <= in_a; x <= in_b; END IF; t6 <=((y(7 downto 1))+((x)-(x(7 downto 3))))+(y(7 downto 6)); IF t6 >= x THEN t7 <= t6; ELSE t7 <= x; END IF; out_z <= t7; done <= '1'; END PROCESS; Fig. 5 shows (a) the evolution of the fitness of the best individual and (b) its area versus generation curve. In Fig. 5 (b) it can be noted that at generation 13 the best individual increases its area in order to achieve better accuracy. At generation 18 it reaches the desired accuracy and begins to look for optimal area, which is found at generation 29. Table 2 reports optimal area implementation (LE column) obtained for other values of the functional specification target error (te) with the respective hardware descriptions. An interesting result shown in Table 2 refers to the individual with optimal area for te=25% that do not use the internal signal y (minimum between the two inputs). This indicates that the system actually looks for optimal area, taking into account route and resource issues of the synthesis tool.
In some independent runs it could be noted that the first functional individual was also the one with optimal area. Nevertheless, this is not always true, as was shown in the SRA design for te=2.5%. Also, an experience was made adding the multiplier and shift-left operators on the right-hand side of <op> and <expr> non-terminal symbols, respectively (Cf. Subsection 5.1). The system showed to be robust by eliminating individuals with these two operators from the early generations. x + x(7 downto 3)
15% 58
x + y(7 downto 2)
12% 59
x + y(7 downto 1)
5% 70
x -x(7 downto 3) + y(7 downto 1)
The drawback of the proposed methodology is its excessive time consumption. Currently available tools for compiling VHDL programs are slow, particularly the place-and-route ones [31] . To overcome this problem a pre-stage, to be run before the start of the main optimization process shown in Fig. 1 , was introduced. In the prestage the system evolves a single VHDL program as in [16] , each individual is described in a single VHDL PROCESS to bring the whole population close to the target functionality. No place-and-route is accomplished at this time. The pre-stage finishes when the first functional individual comes out.
With this revised approach the system runs three times faster while remaining at the pre-stage. It was observed that the performance of the overall design process improves considerably when compared with the original approach (Fig. 1). 
Conclusion
A methodology based on Genetic Programming paradigm to evolve optimized implementations was presented. It implies the use of a multiobjective evolutionary technique. The methodology is fully automatic in the sense that it is not necessary to write code but define a BNF grammar and specify the design constraints. Cycling through a place-and-route process makes it possible to consider the exact area used by each implementation in the optimization.
The proposed methodology is capable of synthesizing combinational circuits, using basic digital or mathematical operators declared in a BNF, which approximates a given function with a specific accuracy while looking for optimal area. From this work it can be concluded: i) It is not necessary to exactly specify the basic operators in the BNF. ii) Performance optimization may be achieved instead of (or in addition to) area if one considers timing parameters in the objective function. iii) The methodology is synthesis-tool independent.
Finally, the methodology can be used to implement a new function in a semifilled FPGA, i.e., in addition to an already described function (in a VHDL PROCESS of the VHDL program). In this case the new function is described in a new VHDL PROCESS. The code of the already described function is maintained fixed while the new code will evolve. The system will look for synthesizing the new function using any common construct between both functions in order to reduce area usage.
