Abstract
Introduction
Various strategies have been developed in the recent years in order to implement adaptive and evolvable hardware [10, 20] . It seems that an interesting path leads to the idea of evolvable hardware at the Intellectual Property (IP) level that has been introduced by Stoica et al. [14] . Sekanina has shown how the evolvable IP core can be implemented in an ordinary FPGA [11] . The realization is based on creating a virtual reconfigurable circuit (VRC) and an evolutionary algorithm in the ordinary FPGA.
Contemporary digital evolvable systems are built either as ASICs or as boards containing FPGAs combined with a powerful PC where the evolution is carried out. From this perspective, the approach utilizing a VRC offers many benefits, including: (1) It is relatively inexpensive, because the whole evolvable system is realized using an ordinary FPGA. ( 2) The architecture of the (virtual) reconfigurable device can be designed exactly according the needs of a given problem. (3) Because the whole evolvable system is available at the level of HDL source code, it can easily be modified and synthesized for various target platforms (FPGA families). (4) The evolvable (hardware) system can be offered and reused as software (i.e. as a soft IP core).
The objective of this paper is to demonstrate that (1) a complete implementation of a class of digital evolvable systems (which are based on the virtual reconfigurable circuits) can automatically be generated from a high-level specification and (2) non-trivial circuits can effectively be evolved using the implementation. An approach is presented allowing designers to rapidly describe, simulate, synthesize and realize a domain-specific virtual reconfigurable circuit. In connection with the hardware implementation of the evolutionary algorithm, the whole evolvable system can routinely be realized in an ordinary FPGA (placed on a general-purpose board) in a reasonably short time. It will be shown that non-trivial combinational circuits (e.g. multipliers) can be evolved in a few seconds on this kind of evolvable machine. The result is similar to the evolution of analog circuits in a second on the FPTA [15] . The COMBO6 card developed in the Liberouter project is employed as a target platform [5] .
The rest of this paper is organized as follows. Section 2 summarizes the approaches utilized to realize the evolvable systems on FPGAs, including virtual reconfigurable circuits. In Section 3, the proposed method for routine designing of evolvable digital systems is presented. Section 4 deals with the target platform utilized to verify the method. Section 5 describes experiments performed and their results. While the obtained results are discussed in Section 6, conclusions are given in Section 7.
Evolvable Systems in FPGAs
If we need to realize continually evolving systems, the evolutionary algorithm has to be placed in the target systems. The evolution algorithm is responsible for adaptation to the changing environment, which is reflected via changing fitness function.
Common Approaches
Although various (digital) evolvable systems have been implemented as ASICs (typical examples are given in [2] ), this solution is relatively expensive. Hence a great effort is invested to designing evolvable systems at the level of FPGAs. These solutions can be divided into two groups:
(1) FPGA is used for evaluation of circuits produced by evolutionary algorithm, which is executed in software (running on PC or DSP). Initial experiments were carried out by Thompson who has evolved interesting circuits and discovered that evolution can exploit physical properties of the electronic platform to build a solution [16] .
(2) The whole evolvable system is implemented in the FPGA(s). This type of implementation integrates a hardware realization of evolutionary algorithm and a reconfigurable device. As an example, we can mention Tufte's and Haddow's research in which they introduced Complete Hardware Evolution approach where the evolutionary algorithm is implemented on the same FPGA as the evolving design [17] . The evolvable system is considered as a pipeline and demonstrated on adaptive 1D signal filtering [18] . In their approach only coefficients of the FIR filter (no circuits) were evolved. Perkins et al. presented a self-contained FPGA-based implementation of a spatially structured evolutionary algorithm that provided significant speedup over conventional serial processing for non-linear filtering [9] . In another approach, Martin implemented a set of processors on an FPGA that evaluated (in parallel) programs generated on the same FPGA [6] . These implementations require a hardware realization of the evolutionary algorithm-this area is relatively independent of evolvable hardware. Various implementations have been proposed, for example [13] .
It is a typical feature of these approaches that the chromosomes are transformed to configuration bit stream and the configuration bit stream is uploaded into the FPGA. Xilinx introduced Jbits to make this work easier [4] . Hollingworth et al. showed how Jbits can be utilized for evolvable hardware [3] . However, it is not easy to decode usually very complex configuration bit stream of FPGA vendors. Furthermore, most families of FPGAs can be configured only externally (i.e. from an external device connected to the configuration port). Internal reconfiguration means that a circuit placed inside an FPGA can configure programmable elements of the same FPGA (which is important for evolvable hardware). Although the internal configuration access port (ICAP) has been integrated into the newest Xilinx Virtex II family [1] , it is still too slow for our purposes.
Virtual Reconfigurable Circuits
Evolvable systems utilizing virtual reconfigurable circuits belong to the category (2) defined in the previous section. VRCs were introduced for digital evolvable hardware as a new kind of rapidly reconfigurable platform utilizing conventional FPGAs [10, 12] . When a VRC is uploaded into the FPGA then its configuration bit stream has to cause that there will be created the following units at spec- ified positions: array of programmable elements (PE), programmable interconnection network, configuration memory and configuration port. Fig. 1 shows that the VRC is in fact a new reconfigurable circuit (consisting of 8 programmable elements in the example) realized on top of an ordinary FPGA. "Virtual" PE2 depicted in Fig. 1 is controlled using 6 bits determining selection of its operands (2+2 bits) and its internal function (2 bits). This architecture is very similar to the representation employed in Cartesian Genetic Programming (CGP) that has been developed for circuit evolution [7] . The routing circuits are created using multiplexers. The configuration memory of VRC is typically implemented as a register array. All bits of the configuration memory are connected to multiplexers that control routing and selection of functions in PEs.
The main advantage of the proposed method is that the array of PEs, the routing circuits and the configuration memory can be designed exactly according to the requirements of a given application. Furthermore, the style of reconfiguration and granularity of the new VRC can exactly fit the needs of a given application. Because VRCs can be described in HDL, they can be synthesized with various constraints and for various target platforms.
As Fig. 2 shows, the VRC can directly be connected to hardware implementation of the evolutionary algorithm Note that FPGA virtualization is sometimes utilized in the reconfigurable computing domain to increase performance. In case of evolvable hardware, Sekanina used the functional-level VRC to implement adaptive image filters [12] . However, his system was designed for a single specific application-no automatic design tools were utilized. Figure 3 shows the general approach to routine designing of evolvable systems using virtual reconfigurable circuits. The basic idea behind the design system is that the user specifies the target application at high level of abstraction and the design system is able to automatically generate VHDL code of the application that can by synthesized for various target platforms. In particular, the specification includes: description of architecture of VRC (the number of inputs and outputs, types and organization of programmable elements, configuration options, configuration strategy, etc.), description of the evolutionary algorithm (type, parameters, chromosome encoding, etc.) and fitness evaluation, and interaction of all these units. The approach is in fact based on combining and tuning various predefined templates. Currently, the design system consists of 
Design Approach

The VRC Designer
The user can choose an architecture of the virtual reconfigurable circuit and its parameters. Then this tool automatically generates synthesizable VHDL code and simulator (in C language) for the required VRC. The simulator is useful for evaluation of the VRC in software. Currently, only a single type of VRC is supported-pipelined array of programmable elements. The architecture is based on the circuit model introduced in CGP and extended to the functional level in [10] . In CGP, digital circuits are composed of programmable elements arranged in a regular pattern of x rows and y columns. The configuration bit stream determines the configuration of these elements, their interconnection and connection of primary inputs and outputs. As an example, the following code has been taken from the specification file that was prepared in order to generate VHDL code of the VRC that will be described in Section 5.2: Configuration interface of all VRCs is equivalent. The templates for other types of VRCs will be available in future.
The EA Designer
The user can choose architecture of the evolutionary algorithm and its parameters. This tool then automatically generates synthesizable code of the required evolutionary algorithm. Currently, only a simple generic evolutionary algorithm is available. The genetic unit is composed of reusable parametric modules. Figure 6 shows that the designer can choose type and size of the chromosome memory (population size), chromosome size, mutation unit etc. The interfaces of all genetic units generated by the EA designer are uniform in order to ensure connectivity with the VRC. It is assumed that the unit is controlled form the environment (using signals NC, "generate new configuration"; BC, "get the best configuration", etc.) and the environment is able to evaluate any chromosome that was generated by the genetic unit and uploaded into VRC. This strategy is known from evolvable components [10] . The templates for other types of evolutionary algorithms will be available in future.
The Fitness Designer
The environment has to evaluate any circuit uploaded into the VRC and to send the fitness value into the genetic unit (see Fig. 2 ). Because the fitness calculation is application-specific, its definition is left opened for the user. However, a generic template is available for typical situations, such as evaluation of combinational circuits using a truth table. The implementation is able to send test vectors to the inputs of VRC, collect responses at the outputs of VRC, compare them with the required vectors and increase the counter of fitness value. The implementation is pipelined and the pipeline naturally extends the pipeline of the VRC.
The Integrator
The integrator corresponds to the top-level-entity file in VHDL. It interconnects VRC, genetic unit and fitness calcu- 
Target Platform: COMBO6
COMBO6 developed in the Liberouter project is a PCI card primarily dedicated for a dual-stack (IPv4 and IPv6) router hardware accelerator [5, 8] . This board offers an extremely high computational power (FPGA Virtex XC2V3000 by Xilinx, Inc. with more than 3 mil. equivalent gates, up to 2GB DDR SDRAM, up to 9Mbit context addressable memory, etc.) and so it is well suited for development and the use in various application domains, including evolvable hardware.
We decided to use this card for our experiments because it offers us a sufficient performance and capacity of FPGA. Furthermore, the card was developed in cooperation with the Faculty of Information Technology in Brno. Nevertheless, the primary advantage of the proposed approach is that any FPGA-based system of sufficient capacity can be used as the target platform.
Experiments and Results
The major objective of this section is to demonstrate that (1) useful digital evolvable hardware systems can be realized physically in a very short time (thus reducing the time from the problem specification till running first experiments substantially) and (2) the circuits evolved using the systems are useful and non-trivial.
Problem 1: Evolutionary Design of the 3×3-bit Multiplier
In order to demonstrate the method we decided to design a high-performance system for evolving small combi- national circuits, such as 3×3-bit multipliers, in a few seconds. These small circuits were initially evolved in software (extrinsically) by Miller et al. [7, 19] . It is believed that hardware implementation can make the evolution faster.
Proposed Evolvable System
Virtual Reconfigurable Circuit: Figure 5 shows the architecture of VRC, which was automatically generated from the specification given in Section 3.1. The circuit consists of 80 PEs (10 rows × 8 columns) equipped with flip-flops allowing pipelined processing. Each of them can be programmed to perform one of eight functions that are evident from the same figure.
Any PE can be connected to some of circuit inputs or to some of the outputs of PEs placed in the previous column. In contrary to Miller's experiments, in which inputs of PEs could be connected to a PE in whichever preceding column, we allowed the interconnection between neighboring columns only. Although we restricted the search space substantially and thus made the evolution of innovative designs probably impossible, we obtained a relatively cheap implementation in hardware utilizing only relatively inexpensive 16-input multiplexers in the interconnection network.
Because the VRC is utilized for evolution of 3×3-bit multipliers, the inputs 0-2 serve for the first operand and the inputs 3-5 serve for the second operand of the multiplier. The 6-bit output is directly connected to the middle PEs of the last column. 
Figure 6. Genetic unit
In order to define behavior of the VRC, 880 configuration bits have to be uploaded. The configuration of each PE is defined using 11 bits-the four define the connection of the first input, the other four define the connection of the second input and the remaining three bits select the function of PE. The configuration bits are stored in the 880-bit configuration register realized using flip-flops available in the FPGA. We need 8 clocks to completely change the configuration information and thus the behavior of the VRC.
Evolutionary Algorithm: Fig. 6 shows hardware realization of the genetic unit generated using the EA Designer. Chromosome memory consists of four 880-bit chromosomes; each of them is divided into eight banks per 110 bits. The initial four-member population is generated randomly and evaluated. In order to make hardware implementation easier and with respect to the results in [7] , new populations are produced as follows. A mutated version of each chromosome is evaluated. If the obtained fitness value is higher than the fitness value of "parent" chromosome then the mutated chromosome replaces the parent in the chromosome memory. This is repeated for all chromosomes in the memory until a correct solution is found or the predefined number of generations is exhausted. Based on experiments, we decided to invert four bits per chromosome on average by the mutation unit.
The controller is responsible for communication between the genetic unit and the environment and for configuring the VRC. The pseudo-random numbers are generated using LSFR seeded from software via PCI bus.
Circuit Evaluation: The Fitness Designer is able to generate a circuit evaluating the circuits uploaded in the VRC. In our case, the circuit generates 2 6 = 64 test vectors (all possible input combinations), applies them at the VRC in- put, reads the output vectors from VRC and compares them against the required vectors. The fitness value is incremented for every output bit calculated correctly. It takes 8 clocks to obtain an output vector from VRC. However, thanks to pipelined processing, one output vector is available per a clock. In the current version, the fitness value is available in 64+8 = 72 clocks where the 8 clocks represent the configuration and communication overhead. Nevertheless, the overhead can be reduced in case of pipelined reconfiguration (which is has not been implemented yet).
Synthesis: After simulations in ModelSim, the design was synthesized using LeonardoSpectrum to Virtex FPGA XC2V3000bf957, which is available at COMBO6 card. The complete synthesis process took about 25 min at Sun Blade. The whole evolvable system requires 403,372 equivalent gates. Tables 1 and 2 summarize the results of synthesis.
In this implementation the population in stored in Block RAMs. The design can operate at 93.3 MHz. The results that will be described in the next section were obtained using 50MHz only because of easier synchronization with PCI interface. However, there is a potential to go beyond 120MHz by optimizing some parts of the design. Figure 7 shows an example of the evolved 3×3-bit multiplier. Our analysis has shown that the circuit utilizes 45 
Results
where g is the number of generations, p is population size, v is the number of test vectors and c denotes the overhead.
Considering f m = 100 MHz (which we will reach with the optimized design) then we can obtain the design time 15.5 sec. on average.
Problem 2: The 4×3 Multiplier
It is easy to modify the specification file and to synthesize the evolvable system for designing other circuits, for example, 4×3 multipliers. The results of synthesis are given in Table 3 . Maximal operational frequence is 89.4 MHz. Figure 8 shows an example of the evolved 4×3 pipelined multiplier. The VRC contains 10 × 10 PEs and utilizes 10 × 130 = 1300 configuration bits. Since the corresponding truth table is two times larger than in the previous case, the fitness calculation is two times slower.
We performed 19 runs and obtained the fully correct multipliers in 10 runs, after 265 millions generations on average. It corresponds to the design time about 48 minutes on average (at 50 MHz). The fastest run required 44 millions generations, i.e. 504 seconds. 
Discussion
The experiments have confirmed that the design time of the proposed evolvable systems is really short. Starting with specification, the design can be completed in a few hours including synthesis, placement and routing. It is easy to go back, modify some parts and synthesize a completely new evolvable system again. The main advantage is that the method is based on software approach; no operations with a physical hardware are needed if a card or a board containing a sufficiently large FPGA is available.
It is impossible to directly compare the presented results and the results from [19] , for instance, because of different interconnection strategy of PEs. Recall that the best 3×3-bit multiplier evolved in [19] consists of 23 gates and the evolution was carried out using an array of 1 × 35 PEs, by allowing unrestricted interconnections of the PEs. In order to perform 10,000 generations, about 2 sec. are needed on Pentium@200MHz [19] (our estimate is 0.8 sec. for current Pentium IV@2.6GHz). In some cases a few millions of generations were produced to find a solution.
The speedup obtained by means of COMBO6 (f m =100 MHz) and calculated using the available values is 69 (against Pentium) and 28 (against Pentium IV). Note that we do not compare the total time of evolution here. Table 1 indicates that four VRCs could be implemented on the same FPGA, allowing four times higher performance.
Nevertheless, in this view, the obtained speed up could be understood as low. The reason is that it is very easy to evaluate candidate circuits in software (if proper encoding is utilized). Note that it is not the case of evolving analog circuits for which the circuit simulator is usually very slow. Hence hardware implementation (JPL's FPTA, for instance) can reduce the time of the evolutionary design of analog circuits by 4+ orders of magnitude if compared with PSPICE running at Pentium II 3000 Pro [15] . The proposed approach utilizing VRCs seems to be useful for such designs in which the circuit evaluation is very time consuming in software. As a typical example, we can mention the evolutionary design of image filters [10] .
The evolvable system proposed in Section 5 is devoted for speeding up the evolutionary design of small combinational circuits. However, its implementation is very areademanding in comparison to the size of evolved circuits. The approach is much more suitable for evolvable hardware at the functional level, in which programmable elements operate with more complicated functions (such as addition, minimum, maximum, etc.) and over words instead of bits. Note that in order to support the functional level evolution, only datapath size (BIT parameter) has to be modified in the code given in Section 3.1. Therefore, it seems to be reasonable to apply the approach in real-world systems in which the evolution is responsible for adaptation.
We have to also mention that it was not our goal to minimize the number of gates used in the evolved circuits. Considering evolvable adaptive systems (in which the evolutionary algorithm is a part of the target system), there is not usually the requirement to minimize the number of elements utilized in evolved circuits. All the PEs physically exist in the system and they are available for free for the evolutionary purposes, as opposed to the strategy used in the evolutionary design of a single circuit [10] . The advantage of the evolved multipliers is that they are inherently pipelined, which is useful for processing large data sets.
Conclusions
An approach to routine designing of high-performance evolvable systems has been introduced in this paper. Using the proposed method and tools we were able to quickly design complete evolvable systems in a physical FPGA. The design time was reduced drastically in comparison to previous approaches. The created systems were utilized to evolve small combinational circuits in a very short time. In particular we evolved the pipelined multipliers.
We do believe that the proposed approach represents a step towards routine designing of evolvable systems. In fact, the problem of digital evolvable hardware design was completely transformed to the software domain by means of the proposed method. The future work will be devoted to extending the design tools in order to generate other types of virtual reconfigurable circuits and evolutionary algorithms automatically. We will work also on improving the quality of evolvable systems (hardware) generated using the tools.
