Modern Field-Programmable Gate Arrays (FPGAs) are becoming very popular in embedded systems and high-performance applications. FPGA has benefited from the shrinking of transistor feature size, which allows more on-chip reconfigurable (e.g. memories and look-up tables) and routing resources. Unfortunately, the amount of reconfigurable resources in a FPGA is fixed and limited. This paper investigates an applicationmapping scheme in FPGA by utilizing sequential processing units and task specific hardware. Genetic Algorithm is used in this study. We found that placing sequential processor cores into FPGA can improve the resource utilization efficiency and achieved acceptable system performance. In this paper, two cases were studied to determine the trade-off between resource optimization and system performance.
INTRODUCTION
In recent years, Field-Programmable Gate Arrays (FPGAs) have gained popularity in the digital integrated circuit market, specifically in high-performance embedded applications. One of the most significant features of FPGAs is that designers can configure them to implement complex hardware in the field. With the improvement of integrated circuit technology, very large logical structures are allowed to reside in a single FPGA chip [1] . Not only the application specific functional units, but also the embedded processors can be configured into FPGAs. Once a processor is implemented in the FPGA, it can be reused many times in a time-shared manner. Unfortunately, the sequential nature of processors limits the system performance [2] .
It is important to note that FPGA resources are limited. When the hardware resources required by a task are more than the available resources in a single FPGA chip, this means that it is impossible to realize this system with the given FPGA. If we configure the processor IP cores into FPGAs, some portions of the design can be implemented using C programming. For example, we can implement a task system, which consumes 6,674 Configurable Logic Blocks (CLBs), using a single FPGA chip with just 1,920 CLBs by placing one soft processor IP core into the FPGA. This allows the finite resources in a FPGA to be used with optimum efficiency. However, the processor IP cores consume both the hardware logic (LUTs) and block RAM resources of FPGAs. The number of processor IP cores that can be placed into FPGAs is limited by the finite resources of the FPGA. We found that it is not desirable to put the maximum number of processor IP cores into a FPGA because the sequential operations in the processor will unnecessarily restrict system performance. In this paper, two cases were explored using the Finite Resource Optimization Analysis Model in order to obtain the optimal FPGA finite resources utilization scheme. This paper is organized as follows: An overview of FPGA architecture is described in Section 2; Section 3 presents FPGA Finite Resource Optimization Analysis Model with Genetic Algorithm; Sections 4 and 5 provide a computational complexity analysis with an example; simulations and results are presented in Section 6; and the conclusion is presented in Section 7.
FPGA ARCHITECTURES
The basic architecture of FPGAs consists of an array of logic blocks, programmable interconnects, and I/O blocks. A logic block which includes a fixed number of LUTs and flip-flops is called a configurable logic block (CLB) or a logic array block (LAB). In this paper, CLB is used.
The architecture of modern FPGAs is becoming more complicated. It is composed of more resource elements, such as embedded memory blocks (bRAMs), multipliers, and even processor IP cores. As an example, a Xilinx Virtex-II Pro FPGA includes an array of CLBs, IOB, Multipliers, Block RAM, Embedded RocketIO, and two processors (IBM PowerPC 405) [1] . With these modern FPGAs, it is possible to configure multiple soft processor cores into a single FPGA device. resources assignment algorithm based on Genetic Algorithm. One of the inputs to the algorithm is a FPGA resource list, which includes the number of CLBs for a given FPGA device, the number of soft processors that can be integrated into the given FPGA, and the number of CLBs and the size of the bRAMs that each soft processor employs. Another input is the task system list, which gives detailed information about the application, including the number of tasks and how these tasks are related. The task system list is broken down into a set of tasks which can be executed as software processes on a soft processor core, or as hardware functions (implemented using hardware description language) within a FPGA. When the task system is created, we define (through profiling and hardware synthesis reports) the execution time for the software and hardware respectively. Sample task system data is shown as Table 1 [3] . We use a directed acyclic graph (DAG) to present a task system [4] . A sample DAG task system is shown in Figure 2 Figure 2. A sample task system DAG The FPGA resource assignment algorithm generates the FPGA resource utilization information and the task system schedule. The former shows how the FPGA resources (e.g. CLB and memory) are used. The latter shows the schedule length (unit of time) of the task system. The shorter the schedule length is, the more desirable the solution.
The FPGA Resources Assignment Algorithm
The FPGA Resource Assignment Algorithm is based on Genetic Algorithm (GA). The basic operations of GA include initialization, evaluation, selection, reproduction, and termination [4] . Starting from an initial population (list scheduling is used), a population is randomly initialized with tasks assignment using the available hardware resources or soft processors. Each member of the population is expressed by a separate data structure. For each candidate, the schedule length is the fitness function. The selection of candidates for subsequent generation is based on their fitness function. A set of parents with the best genetic information (better schedule length) are selected to breed the offspring. A roulette-wheel-style selection is used in this process. The reproduction process creates the next generation of population through two genetic operations: crossover and mutation. A singlepoint crossover technique is used for the crossover process. We randomly select a location on the chromosome structure as the crossover point. The new candidate is generated by combining two parent candidates at the chosen crossover point. Part of the new candidate is generated from the first parent chromosome above the crossover point, and the other part comes from the second parent chromosome below the crossover point. The mutation operation will take place according to a given probability of mutation parameter. The task assignments are selected at random for the mutation process when the mutation occurs. There is an equal probability that the genes are chosen from either parent when the mutation does not occur [2] . The evaluation, selection, and reproduction processes are repeated until a termination condition has been reached. Figure 3 Figure 3 . Genetic Algorithm Process.
COMPUTATIONAL COMPLEXITY OF GENETIC ALGORITHM
In this section, we discuss the time complexity and efficiency of the FPGA resources assignment algorithm. There are many factors that will influence the GA efficiency; the following parameters are considered in this paper: the size of the population in GA (p), the number of evolution generation (i), the probability of mutation (pm), and the number of soft processors to be configured into FPGA(s).
To determine the complexity of this GA and measure the efficiency, tests were carried out using a computer system with Pentium 4 3GHz processor 1GB memory running CentOS 4.3 Kernel 2.6.9. The number of tasks in the task system is 10, 16, According to the computational complexity theory, we can know that the time complexity of GA is O(p 2 ). There is a sorting operation to sort all candidates in a population with a bubble sorting algorithm for crossover and mutation operations. The time complexity of the candidates sorting operation is O(p 2 ) [5] . The sorting operation is the most complex process in this GA implementation. So, we can use O(p 2 ) to present the time complexity of GA. On the other hand, the population size p is an important parameter for a GA and can determine the search space complexity of a GA directly. If we enlarge the search space, the computational complexity will be increased greatly. From this analysis, the population size must be selected carefully for GA. If it is too low, the search space is too small to provide enough possible solutions and the GA receives the poor results. If it is too large, time complexity of the algorithm will be too large and the GA receives the lower algorithm efficiency [6] .
FPGA RESOURCE UTILIZATION ANALYSIS CASE STUDIES
In this experiment, the Power Quality Monitor System (PQMS) [4] is analyzed and targeted to a Xilinx XC3S1000 FPGA using a different assignment scheme to test the utilization of the FPGA resources. The PQMS is designed to measure the quality and reliability of the power system. It can be broken down into ten tasks. Table 2 shows the performance data of hardware and software of the PQMS. A Xilinx XC3S1000 FPGA consists of 1,920 CLBs and 55,296 bytes bRAM. We assume that each Xilinx Microblaze soft processor will consume 500 CLBs. If all the tasks are to be implemented in HDL, it will require 12,020 CLBs. This is more than what is available in XC3S1000; thus, it is impossible for this task system to be implemented using pure HDL. As shown in Table 2 , three cases with one, two, and three Microblaze soft processors using the Finite Resource Optimization Analysis Model. When one Xilinx Microblaze is used, PQMS task system can be fitted into the FPGA because some of the tasks are assigned to the soft processor and others are implemented as HDLs. The task system consumes 95.408% of FPGA hardware resources and the design achieves acceptable system performance. When two Xilinx Microblazes are utilized, the percentage of the hardware resources utilization of FPGA is 90.668% and better system performance is obtained. When three Xilinx Microblazes are used, it resulted in worse resource utilization and system performance. The reason is that the more tasks are assigned into the soft processors, the performance suffers from the highly sequential operations; thus, the best area-time trade-off in this example is to use two soft processors. With this configuration, some tasks execute in software processors and the rest of the tasks are implemented using HDL.
In Table 3 , the schedule length is an approximate value. The real schedule length is 1000 times of the value in the Table 3 . The ten different random seeds of a GA are used in the simulation for each row. Only the best result from each run is shown in Table 3 . 
MORE COMPREHENSIVE SIMULATIONS AND RESULTS
In this section, we performed more simulations and expanded the results with a 30-tasks system, Space Shuttle Turbo Pump Task System [7] . This time, we selected the Xilinx XC3S5000 with 8,320 CLBs and 234K bytes bRAMs. We also assume that each soft processor consumes 500 CLBs. The FPGA Finite Resource Optimization Analysis Model is used to determine the best possible performance within the available resources. Table 4 contains the results of the simulations.
For each simulation configuration setup (1-8) in Table 4 , a randomly selected seed was used to start the Genetic Algorithm simulations. A total of eighty simulations were completed. Generally, for this task system, as the number of processors used increase, it has a positive impact on performance and resource utilization, but this result does not mean that the more processors used is equivalent to better performance. 
CONCLUSIONS
This paper presents a basic overview of the genetic algorithm with resource utilization analysis for FPGAs. How to utilize the finite FPGA resources for optimum efficiency is significant in the FPGA design. Integrating soft processors into FPGAs can greatly improve the FPGA's resource utilization efficiency, but it does not mean that more soft processors equal improved performance. In fact, the opposite is true. For the different applications and given FPGA devices, we show that the trade-off between resource utilization and system performance can be found using FPGAs Finite Resource Optimization Analysis Model.
