An integrated platform for fast genetic operators is presented to support intrinsic evolution on Xilinx Virtex II Pro Field Programmable Gate Arrays (FPGAs). Dynamic bitstream compilation is achieved by directly manipulating the bitstream using a layered design. Experimental results on a case study have shown that a full design as well as a full repair is achievable using this platform with an average time of 0.4 microseconds to perform the genetic mutation, 0.7 microseconds to perform the genetic crossover, and 5.6 milliseconds for one input pattern intrinsic evaluation. This represents a performance advantage of three orders of magnitude over JBITS and more than seven orders of magnitude over the Xilinx design tool driven flow for realizing intrinsic genetic operators on a Virtex II Pro device.
INTRODUCTION
Intrinsic evolutionary approaches such as Genetic Algorithms (GAs) are used throughout the literature to realize hardware-in-the-loop FPGA-based system design and repair strategies [1] [2] [3] . They realize search algorithms based on the Darwinian's evolution principles by performing genetic operations such as mutation and crossover. Many variations of GAs were introduced to enhance the performance and speed of convergence to a solution for FPGA-based systems [4] , however, many of these algorithm implementations are software-in-the-loop simulations rather than real implementations on the FPGA fabric. Challenges of realizing practical intrinsic evolutionary strategies include the mapping of the genotype in the GA into its corresponding phenotype on the fabric, and the limited control over process automation of altering and downloading safe bitstreams onto the device. These issues are exacerbated when the critical portions of bitstream representation are proprietary.
In this paper, an approach that provides a fast interface between the GA and the FPGA device via a straightforward data-structure and Application Programming Interfaces (APIs) is presented. A layered design is used to perform mapping operations directly on the bitstream to modify LookUp Table ( LUT) configurations, and reprogram the device. In addition, it supports Inputs/Output transfers via the JTAG standard serial port for fitness measurement purposes.
The remainder of the paper is organized as follows: Section 2 provides an overview of related work. Section 3 introduces the platform design. Section 4 discusses the experimental design and results, and Section 5 concludes the paper and suggests a direction for future work.
RELATED WORK
There are two paradigms for implementing GAs in reconfigurable applications: Extrinsic Evolution via functional models that abstract the physical aspects of the real device, and Intrinsic Evolution on the actual devices. It is evident that extrinsic approaches simplify the evolution process as they operate on software models of the FPGAs.
However for applications like in-situ fault handling on deep space missions, not all fault types can be readily accommodated by software models. Additionally, abstracting the physical aspects of the target device complicates rendering the final designs into actual on-board circuits, for instance, limitations such as routability of the design cannot be ensured until the final stages of the configuration process. For these reasons, intrinsic evolution can provide a direct approach to realizing physical designs for a specific FPGA device.
Several previous research efforts have addressed intrinsic evolution. A successful attempt on Field Programmable Transistor Array (FPTA) chips was carried out by [3] . They proposed new ideas for long-term hardware reliability using evolvable hardware techniques via an evolutionary design tool (EHWPack) that facilitates intrinsic evolution by incorporating PGAPack genetic engine with Labview test-bed running on UNIX workstation. They were able to intrinsically evolve a Digital XNOR Gate on two connected FPTA boards. In this paper, we target FPGAs rather than FPTAs and namely the popular Xilinx Virtex II Pro device.
Miller, Thomson, and Fogarty [2] previously addressed the importance of direct evolution on the Xilinx 6216 FPGA devices; the research explored the effect of the device physical constraints on evolving digital circuits. A mapping between the representation genotype and the device phenotype was proposed, however, no implementation details were presented.
Hollingworth, Smith, and Tyrrell develop intrinsic evolution platform for a 2-bit adder on a Xilinx FPGA with partial reconfiguration to improve evolution time [6] . However, they used the JBits interface for run-time reconfiguration. JBits is Java-based, and being that it isinterpreted can face scalability and performance issues.
In a previous work, a Multilayer Runtime Reconfiguration Architecture (MRRA) was developed for Autonomous Runtime Partial Reconfiguration of FPGA devices [7] . The tool comprises three layers (Logic, Translation, and Reconfiguration layers) with well-defined interfaces for modularity and reuse. In addition, a standard set of Application Programming Interface (API) was utilized for communication with the target device. Results had shown the ability of the framework to support autonomous and dynamic reconfiguration operations. In this paper, MRRA is extended to support genetic operators directly to realize intrinsic evolution on Xilinx Virtex II Pro devices as discussed in the following sections.
JTAG-DRIVEN PLATFORM
The developed platform consists of hardware components that reside on the FPGA chip and software components which reside on the host PC, however, they are developed into layered modules that can be readily migrated to work on the PowerPC on chip in later phases of this research. The main components of the platform are shown in Fig. 1 
INTRINSIC EVOLUTION CASE STUDY

Experimental Design
The circuit used to demonstrate the platform workflow is a 4-bit x 4-bit adder. It provides a tractable circuit for the GA to evolve that exhibits characteristics for large arithmetic circuits including a variable amount of redundancy and combinational logic behavior. The circuit layout on Xilinx Virtex II Pro chip is shown in Fig. 2 . The GA parameters used throughout the experiments are shown in Table 1 . Total of 8 LUTs were used in the design experiments, this number was increased to 13 LUTs in the repair experiment to add some redundancy margin for the GA to evolve within. All GA parameters were extracted by running extrinsic evolution of the GA and finding out the optimal values. Table 1 shows the range of tested values for each parameter along with the optimal one. Population sizes between 5 and 20 were evaluated and best results were achieved using population size of 10. Crossover rates in the range 30%-90% (increment of 10%) were tested, the GA performed better when the value was set to 60%. Same applies to the other parameters. Repair: A single stuck-at fault was adopted as a case study to show the capability of the platform to repair the faulty circuit. Since an actual fault cannot be readily nor precisely introduced into the device, the circuit is stimulated to behave as if the fault actually exists. This course of action becomes more complicated considering the fact that the platform allows only functional logic manipulation without the possibility of altering the device interconnects. Hence, the bitstream was processed directly before configuring the device to modify the contents of one LUT so that it behaves as if a stuck-at fault is present. The LUT in Virtex II Pro chip is a 16-bit lookup table with four inputs and one output. If the Least Significant Bit (LSB) input pin is stuck-at zero, only the memory locations of the pattern (XXX0) 2 -where x is the Don 't Care logicwill be accessible. This behavior can be achieved by copying the content of the memory locations of the pattern (XXXO ) 2 into (XXX1) 2 and overwrites their old values as shown in Fig. 3 . Likewise, if the fault is stuck-at one in the second LSB input pin, and by following the previous analysis, any reference to (XX0X ) 2 should be directed to (XX 1X) 2. The same concept can be extended where the location of the error determines the stride between the memory locations to copy, and the value of the stuck at condition (zero or one) determines the direction of the copy operation (left or right) as shown in Fig. 3. 
Results:
Five intrinsic evolutions were achieved for each of the unseeded, seeded, and repair experiments using the presented platform. The GA parameters listed in Table 1 were used. The following aspects were measured to quantify the capability of the platform: Experimental results are listed in Table 2 . It can be seer from the results that the GA operators' time is smali compared to the fitness measurement time. Moreover, i gets very small compared to the device programming tim4
which was found to be 22 seconds. Device programminj time is high due to two reasons: First, the JTAG serial por which can work at 300Kbps [9] was used rather thar SelectMap interface that can operate at a maximum o 66MHz clock speed [10] , second, 548Kbyte full bitstrean file was used rather than the 80Kbyte partia reconfiguration file. Fig. 4 Table 2 , the timing measurement of the probabilitydriven mutation and crossover operators for each run is listed. The mutation and crossover average times throughout the runs were around 0.2 and 0.4 microseconds respectively. To measure the exact time that mutation and crossover operations require, another experiment was carried out by setting the mutation and crossover rates to 100% to ensure that the operators are performed with certainty. This allowed measurement of the time for each operation individually. The results of this experiment and similar experiments using Xilinx design tool driven flow n and using JBITs are listed in Table 3 . It can be seen from ll the results that more than seven orders of magnitude it enhancement over Xilinx design tool driven flow and three e orders of magnitude enhancement over JBITs was g achieved by the developed platform. 
CONCLUSION AND FUTURE WORK
An intrinsic evolution platform is developed for genetic operators and fitness assessment using API layers which directly manipulate the configuration bitstream on Xilinx Virtex II Pro devices. Communication between the host PC and the FPGA device is carried out via the JTAG port. GNAT is utilized for intrinsic fitness measurement. Three experiments were conducted: unseeded design, seeded design, and repair. Experimental results have shown successful evolution with an average time of 0.4 microseconds to perform the genetic mutation, 0.7 microseconds to perform the genetic crossover, and 5.6 milliseconds for one input pattern intrinsic evaluation. Future work is proceeding towards a System-on-Chip version using the PowerPC to execute the genetic algorithm. This will reduce the significance of the data transfer time relative to genetic operator time.
