Abstract-Reconfigurable system on chip is well known for its flexibility for high performance embedded systems. The hardware/software (HW/SW) partitioning is the most important phase during the design of reconfigurable system on chip. A great many different algorithms have been adopted for solving the hardware/software partitioning problem. Shuffled Frog Leaping Algorithm (SFLA) is popular for its simple concepts, little parameter adjustment, high calculation speed, strong global search optimization capability and easy execution. In this paper, we apply the SFLA algorithm to solving hardware/software partitioning problem on reconfigurable system on chip with coarsegrained.
I. INTRODUCTION
With the development of microelectronics and computer technology, especially large-scale emergence of F-PGA programmable devices, real-time circuit remodeling ideas have attracted increasing attention of researchers. In reconfigurable systems, hardware information (configuration information of programmable devices) can be the same as the software program. This will not only keep **Corresponding Author: Xin Zuo Email: zuoxin@hnu.edu.cn the performance of computing hardware, but also the flexibility of the software.
HW/SW co-design is popular in designing the embedded system. HW/SW partitioning is the most important phase during the HW/SW co-design. As far as we know, HW/SW partitioning problem is a typical optimization problem. If we only take into consideration one cost of the embedded system, the HW/SW is simply a single optimization. Unfortunately, in the real life, the embedded system puts strict requirements on size, power consumption, time consumption and reliability. Hence, it becomes a multiobjective optimization problem. Furthermore the multiobjective optimization problem is a NP-complete problem, which we can not find a polynomial time algorithm to solve it. For this reason, HW/SW partitioning problem is a NP-complete problem. A lot of researchers have focused on this problem.
Generally speaking, HW/SW partitioning problem involves two scenarios: namely static HW/SW partitioning and dynamic HW/SW partitioning. Static HW/SW partitioning, refers to that the functional unit is mapped to the hardware or software is decided during the design of embedded system. Once the design of embedded system is completed, the implementation of the functional unit can not be changed. The emergence of reconfigurable technology allows us to use the dynamic HW/SW partitioning algorithm for dealing with the HW/SW partitioning problem. We can change the implementation of each function unit in runtime with the dynamic partitioning technology. Dynamic HW/SW partitioning is flexible for the embedded system. However, one of its main drawback is that, if the partitioning algorithm performance happens to be less than satisfactory, the performance of the whole embedded system will be poor. Against this backdrop, we usually use the Static HW/SW Partitioning for large-scale partitioning problems.
In this paper, our problem for large-scale task embedded system, so the static partitioning is adopted. In particular, we use the SFLA algorithm for HW/SW partitioning with the area constrain. SFLA algorithm is popular for its simple concepts, little parameter adjustment, fast calculation speed, strong global search optimization capability and easy execution. The main contributions of our paper are as follows:
• We have been the first to apply SFLA algorithm to solving HW/SW partitioning problem.
• The value of our object function is the completion time of critical path in the DAG. It is more reasonable in the real reconfigurable system on chip.
• We compare three algorithms with SFLA algorithm, which include greedy algorithm, simulated annealing algorithm and the algorithm combining greedy and simulated annealing algorithm.
• Our algorithm is compared with other algorithms in three different area constrains. The rest of this paper is organized as follows. In the section II, the research background and previous related work are introduced. We outline the hardware model, compute model and problem definition in Section III. It then moves to detail how the shuffled frog leaping algorithm is applied to solving HW/SW partitioning problem, as seen, in the Section IV. Experimental are presented in the section V. The conclusions of our paper are presented in Section VI.
II. RELATED WORKS
A number of researchers have carried research on models of HW/SW partitioning problem. For automate model the partitioning, Adhipathi proposes a novel strategy, Process Model Graphs. He simulates and verifies this strategy in his paper [1] . Sapienza et al. propose the meta model for the HW/SW partitioning problem. The meta model enables use for the partitioning and reuse [2] . However, they do not apply algorithm to solving the HW/SW partitioning problem.
Some researchers focus on the partitioning algorithm for the HW/SW partitioning problem. Niemann et al. were the first to propose using integer programming (IP) to solve HW/SW partitioning problem. The merger programming can obtain the optimal solution, but it yields poor performance when solving the large scale problem [3] . Resano et al. consider the system performance constrain. They propose a strategy for HW/SW partitioning and task scheduling to reduce the energy consumption [4] . In [5] . Abdelhalim et al. combine several swarm intelligence algorithm for the unconstrained design problem in embedded system. According to them, the particle swarm optimization algorithm followed by genetic algorithm, rather than other sequences, can obtain the best results than other sequence [6] . Arato et al. propose a heuristic algorithm which is a polynomial-time algorithm for the HW/SW partitioning problem. Guo et al. focus on the automated HW/SW partitioning problem. Their target is real-time operating system in the SoC. They propose a discrete Hopfield neural network algorithm to solve this problem [7] .
In 2006, Kaizhong et al. model HW/SW partitioning to a 0-1 model over IP cores. This algorithm is called 0-1 algorithm, which makes fully use of the IP core and yields efficient partitioning [8] .
In 2007, Mann et al. and Farmahini et al. respectively propose a new algorithm [9] , based on branch and bound algorithm, and particle swarm optimization algorithm [10] .n order to improve the reuse functionalities of system, Arunachalam et al. propose the genetic algorithm. In this case the system has requirement in functionalities' concurrency and cost constraints [11] . Liu et al. use improved directed acyclic graph to model the HW/SW partitioning problem, and subsequently make use of the immune algorithm to solve it [12] . Further-more, concerning the multi-objective optimization on HW/SW partitioning problem, they propose an immune algorithm to obtain the parieto optimal solution [13] .
In 2013, Pando et al. take a fuzzy approach to solve the HW/SW partitioning problem. This approach is attractive for its flexibility [14] . For special application, sensorless current controller, Bahri et al. put forward a non-dominated sorting genetic algorithm with regards to the HW/SW partitioning. This algorithm can obtain the optimal solution [15] . In [16] , a reliable delay estimation approach is proposed by Hansan et al. In [17] , Han et al. propose a heuristic solution for scheduling and partitioning on multi-processor system on chips (MPSOC). Wu et al. come up with two methods to solve the HW/SW partitioning problem. This is firstly done by transforming the partitioning problem to an extended 0-1 knapsack problem, which will then be solve it by the heuristic one generated by the first method [18] . Sha, et al focus on the HW/SW partitioning problem on MPSoC. They use a dynamic programming method to minimize the system's power consumption under time and area constraints, and propose an optimal algorithm and a heuristic one for tree-structured inputs and DAG, respectively [19] . As seen from the literature, the researchers propose their own algorithms for different target systems with different grain. However, they do not use the SFLA algorithm to solve the HW/SW partitioning problem.
A number of researchers focus on using the SFLA algorithm to solve the optimization algorithm. SFLA algorithm is popular for its simple concepts, little parameter adjustment, high calculation speed, strong global search optimization capability and easy execution. In [20] , Elbeltagi et al. compare memetic algorithm (MA), ant colony algorithm (ACO), genetic algorithms (GA), and SFLA algorithm. In [21] , Horng et al. propose the maximum entropy based on SFLA algorithm thresholds (MESFLOT) method. They use the MESFLOT algorithm to multilevel image threshold selection. For the economic load dispatch problem, Roy et al. combine the genetic algorithm with improved shuffled frog leaping algorithm (MSFLA). The MSFLA algorithm can obtain a better solution in this instance. [22] . Xiao et al. apply an improved shuffled frog leaping algorithm (MSFL) for simulation capability scheduling problem in cloud simulation platform. The simulation capability scheduling problem has a number of multimode constraints [23] . Xu et al. propose an improved SFLA algorithm to solve the hybrid flowshop problem with multiprocessor tasks [24] . In the multidepots vehicle routing problems, Luo et al. bing up an improved SFLA. [25] . However, their technique does not consider using SFLA algorithm for HW/SW partitioning problem, either. In this paper, we apply the SFLA algorithm to solving the HW/SW partitioning problem on reconfigurable system on chip with coarse-grained.
III. MODELS
In this section, we will introduce the hardware architecture model and computing model. The architecture of reconfigurable system on chip will be touched upon, which will be followed by an introduction of the Task Data Flow Graph (TDFG) model for HW/SW partitioning problem.
A. Hardware model
Our target architecture of hardware architecture is shown in Figure 1 . The embedded system consists of CPU and reconfigurable unit. The reconfigurable unit employs Field Programmable Gates Arrays (FPGA). While the main memory uses the non-volatile memory. The Reconfigurable unit and CPU access each other via the share bus and the data is loaded to main memory by bus and will be stored there since then. Non-volatile memories (NVMs) are characterized by low energy consumption, low cost, and high density. Therefore, NVMs likely to be used as the main memory instead of DRAM. For our future work in task schedule on NVMs, we use the NVMs as main memory. When the task is implemented by software, the value of ACost is 0. X is a binary set, which contains two elements, 0 and 1. When the task is implemented by software, X is 0, while if the task is implemented by hardware, X is 1. Our model of HW/SW partitioning problem is based on the task level. Because we take the Coarse-grained to HW/SW partitioning problem, the communication time between CPU and Reconfigurable Unit is taken into consideration. In this section, we describe the SFLA algorithm in details in the beginning, such as Algorithm parameters, Updating strategy, and so on. Then we introduce in detail the SFLA algorithm which is adopted to solve the HW/SW partitioning problem.
A. SFLA description
SFLA is a new evolutionary heuristic algorithm with high-performance computing and excellent global search capability. After outlining the basic principles of mixed SFLA, we have come up with an updated SFLA based on threshold selection strategy. This can solve the problem of significant changes in individual spatial location and reduce the speed of convergence caused by local updating operation. The strategy does not update those individuals which do not meet the threshold conditions, thereby reducing the individual spatial differences and improving the performance of the algorithm. Numerical experiments have demonstrated the effectiveness of the improved algorithm and determined the threshold parameters of the algorithm. SFLA was proposed by Eusuff and Lansey in 2003 in order to solve combinatorial optimization problems [26] . As a new type of artificial intelligence optimization algorithm featuring biological evolution process imitation, SFLA integrates the advantages of two groups of artificial intelligence optimization algorithms, namely, the memetic algorithm (MA) based on memetic evolution and the particle swarm optimization algorithm (PSO) based on group behaviors. SFLA is characterized by simple concepts, little parameter adjustment, high calculation speed, strong global search optimization capability and easy execution.
Shuffled Flog Leaping Algorithm (SFLA) is developed on the idea that a group of frogs live in a wetland, where many stones are scattered around. The frogs search for stones and leap onto places where there exist more food and each frog exchanges information through mutual cultural communication. Each flog has its own culture, which is defined as one solution to the problem. The whole population of frogs in the wetland is divided into different sub-groups, which has its own culture and executes local search strategies. Accordingly, each frog in the sub-group has its own culture, which influences other frogs. Meanwhile, each frog is subject to the impacts of others and evolves together with the sub-group. Once the sub-group evolves at a certain point, idea exchanges will take place across different sub-groups, namely global information exchange, and further realize the mixed algorithm among different sub-groups until the set conditions are met.
Algorithm parameters
Similar to other optimized algorithms, SFLA needs some necessary algorithm parameters, including frogs number F , number of populations m, number of frogs in the populations n, maximum allowed step S max , global best solution P b , local best solution P b , local worst solution P w , the number of frogs in sub-groups of q, times of local mimetic evolution LS.
Updating strategy
For frog populations, the solution with global best fitness is expressed as P x , while the solution with the best fitness and the one with the worst fitness for each sub-group are expressed as P b and P w respectively. First, a local search for each sub-group will be conducted, to be more exact, a set of updated operation on individual frogs with the worse fineness in the sub-group.
The updating strategy in distance of frogs is as equation 1:
The value of each frog after updates is calculated by equation 2:
Where, D s represents the adjustment vector of individual frog, D max represents the maximum allowed step in individual frog [27] . 
B. SFLA for hardware/software partitioning
In this section, we present the SFLA algorithm in detail. We use the SFLA algorithm to solve the HW/SW partition problem in reconfigurable system on chip. The SFLA algorithm can get an approximate optimal result when satisfying the area constraint. The SFLA algorithm for HW/SW partitioning is presented in Algorithm 1.
Before presenting the details of SFLA algorithm, we define the fitness function denoted by f itness in Equation 3.
where T Costh i represents the time cost, when task V i is executed on hardware. T Costs i represents the time cost, when task V i is executed on software. The fitness function f itness represents the total completing time cost of all tasks in the critical path. We schedule the tasks with list scheduling algorithm, which is not our key focus in this paper. More details about list scheduling can be found, in [28] . The area cost constrain denoted by C a in Equation 4 .
where ACost i represents the area cost of task V i . The Equation 4 is mean that the total area cost of all tasks is not more than the value of C a . Sort all the frogs by descending order of f itness.
5:
Find the global optimal P x . 6: Divide the frogs into m groups, and each group contains n frogs. 7: Find the local optimal P b and the local worst P w in each groups. 8: Update the each node in each frog by equation 1 and equation 2. 9: end for 10: return P b .
The following, sheds more light on the details of the SFLA algorithm for HW/SW partitioning problem. In steps 1, we randomly initialize each node of graph G for each frog, X is randomly set as 0 or 1. From steps 2 to 8, are the procedures of optimization using SFLA algorithm for HW/SW partitioning problem. In steps 3, we compute the fitness function f itness of each frog in the swarm by the fitness function in equation 3. Then we sort all the frogs by descending order of f itness and find the global optimal P x in step 4 and 5. In step 6, we divide the frogs into m groups, and each group contains n frogs. In step 7, we find the local optimal fitness P b and the local worst fitness P w .
Step 8 is the most important phase of the SFLA algorithm. We will update the each node in the frog with the worst fitness by equation 1 and equation 2. When times of iteration is equal to S max , the SFLA algorithm is terminated.
In this paper, the HW/SW partitioning problem has the area cost constrain. When we randomly initialize and update the node of each frog, we may get the unreasonable solution, which means that the total area cost of all frogs exceed the limit. During the initialization, we sum the each area cost ACost i from 1 to n when X is 1. If the value of total area cost is more than C a , the procedure of initialization is stopped. We can do the same during updating process.
We will give more details about step 8 as in procedure 1. for j ← 1 to LS do
3:
Update the frog with the worst fitness P w by equation 1 and equation 2.
4:
if The fitness of newD w is better than P w . then
5:
Use the new solution newD w instead of the old solution P w .
6:
Use the P x instead of P b in equation 1 8: Update as step 3.
9:
if The fitness of newD w is worse than P w . then
10:
Randomly initialize the frog with the worst P w . Sort all the frogs in the group by descending order of f itness.
14:
Find the local optimal P b and the local worst P w in the group.
15:
end for 16: end for According to equation 1, the value of D s could be between 0 and 1. When we update each frog with equation 2, the values of newD w can take any value between -1 to 1. However, the HW/SW partitioning problem is a binary problem in this paper, so the nodes of each frog value must be chosen from either 1 or 0. Therefore we take the following method to design the values of node.
• If the value of newD w is more than 0.5, the value of node is 1, which means that the task will be implemented by hardware.
• While the value of newD w is between -0.5 to 0.5, the value of node is as it was.
• The value of node is 0, which means that the task will be implemented by software if the value of newD w is lower than -0.5.
V. EXPERIMENTAL
In this section, we will present the experimental results of evaluating the SFLA algorithm when applied to solving the HW/SW partitioning problem. The benchmarks are DFGs randomly generated by Task Graphs For Free (TGFF) [29] , which is widely used in embedded system and HW/SW co-design for its flexibility and standardized performance. TGFF can generate the pseudo random task graphs for scheduling and allocation research. In this paper, all experiments are executed on a simulator, which is built with C++ program language, using an Intel(R) Core(TM) Duo Processor i3-3327U 1.90G processor.
We set the experiment parameters as follows:
• A is the area cost when all tasks are implemented by hardware.
• The number of frogs F is 100.
• The number of groups m is 10.
• The number of frogs in the populations n is 10.
• The times of local mimetic evolution LS is 10.
• The max iteration S max is 10000.
• For our experiments, we choose three different area constrains, which is respectively C a = A/3, C a = A/2, C a = 3A/4. We compare the completing time of tasks in the critical path comparison among the Greedy Algorithm derived from Grode's Algorithm [30] , Simulated Annealing algorithm (SA) derived from Eles's Algorithm [31] , combined algorithm with Greedy and Simulated Annealing algorithm (GSA) derived from Jing's Algorithm [32] , and SFLA algorithm. The value of time cost are shown in Figure 3 , Figure 4 , and Figure 5 , respectively. As shown in Table V , compared with SA algorithm, with the best in three different area constrain, our algorithm can reduce time cost about 41.09%%, as verse it also can reduce about 9.74%.
As shown in Table VI , compared with GSA algorithm, with the best in three different area constrain, our algorithm can reduce time cost by 18.99%, as verse it also can reduce about 3.37%.
Based on the three experimental results, while the vulue of area constrain C a is equal to A/2, the performance of SFLA algorithm is the best among three area constrains. The cost of completing time can be reduced by 51.30%, 21.04% and 11.61% on average.
VI. CONCLUSIONS
In this paper, we firstly apply SFLA algorithm to solving HW/SW partitioning problem on the reconfigurable system on chip. The potential of SFLA algorithm is substantially exploited so that we can get a low completing time cost of the critical path. Compared with greedy algorithm, the experimental results show that the SFLA algorithm can reduce the time cost by 45.54% on average with three different area constrain. The time cost of SFLA algorithm are also reduced by 23.57% and 9.99% on average with simulated annealing algorithm and combined algorithm with greedy and simulated annealing algorithm. And when area constrain C a is A/2, SFLA algorithm can reduce the time cost by 51.30%, 21.04% and 11.61% on average, compared with that of greedy algorithm, SA, and GSA.
In the future, we will do more research on hardware/software partitioning algorithm. To be more specific, we will engage ourselves in how to improve the SFLA algorithm, and compare it with more algorithm, such as PSO, GA, Chemical Reaction Optimization (CRO) and so on.
