Abstract
Introduction
Many electronics products are developed on an embedded multiprocessor field programmable gate array (FPGA) systems, which have the following advantages. First, flexible hardware and software architectures provide various computing abilities that meet performance requirements. Second, floating operation is more easily implemented by running applications on a processor platform than on a hardware circuit. Third, various functions associated with hardware and software tasks can be evaluated rapidly. Finally, the time and money required for system implementation are reduced compared to those for application specific integrated circuits (ASICs). Consequently, embedded multiprocessor FPGA systems have been adopted by both industry and academia. Industrial applications include the design and verification of portable devices, consumer electronics and telematics. In term of academic research, the fields are hardware-software partitioning, hardwaresoftware codesign and hardware-software cosynthesis.
Hardware-software partitioning is a nondeterministic polynomial (NP)-hard problem that the problem leads to partitioning result that meets all system constraints and an optimal feasible solution too difficult to obtain. Therefore, many researchers [1] - [13] have developed various hardware-software partitioning schemes for effective partitioning. These techniques include genetic algorithm (GA) [1] [3] , multi-level partitioning (MLP) [4] [5] , recursive spectral bisection (RSB) and scheduling algorithm [6] , hardware-oriented partitioning (HOP) [7] , enhancement partitioning [8] , efficiently partitioning [9] , sophisticately computed partitioning [10] and Kernighan and Lin [13] partitioning. However, these studies did not discuss the effect of relational degree for each partitioning combination on system constraints in embedded multiprocessor FPGA systems. Consequently, this work presents a grey relational hardware-software partition scheme for obtaining a partitioning result with low power consumption and fast execution time.
The remainder of this paper is organized as follows. Background and preliminary work for hardware-software partitioning is introduced in Section 2. Section 3 addresses the grey relational hardware-software partitioning approach. Experimental results for a benchmark case with 199 tasks, are presented in Section 4. Finally, conclusions are given in Section 5. 
Related work

Background
Genetic algorithm (GA) evolves successive generations via mutation and crossover to solve hardware-software partitioning problem. Saha et al. [1] used GA to transfer initially hardwaresoftware partition into constraint satisfaction problem (CSP), which they solved to obtain the partitioning results in terms of cost, execution time and concurrency constraints. In 2010, Aoshima and Kanasugi [2] adopt redundant binary number in genetic algorithm for processor that can find the optimized solution and decrease error rates. Zou et al. [3] considered data communication time between hardware and software in a system when it performing hardwaresoftware partition. Moreover, they also suggested that executing mutation in GA should depend on the fitness function rather than the mutation rate.
Embedded multiprocessor system provides a powerful and flexible architecture for solving complex functionalities of consumer electronics. In 2001, Lee et al. [4] presented a multi-level partitioning (MLP) method to perform hardware-software partitioning in distributed embedded multiprocessor systems (DEMS). They used a gradient metric on cost and performance to improve the partitioning result. In 2000, Pomante [5] proposed a heuristic multi-level partitioning method consisting of system level and functional level partitions. Pomante focused on procedures as system specifications to discuss partitioning in heterogeneous multiprocessor embedded systems. In 2006, Lin et al. [6] adopted partitioning and scheduling phases corresponding to recursive spectral bisection (RSB) algorithm and scheduling on embedded multiprocessor systems. In 2007, Lee et al. [7] [8] presented two partitioning techniques to solve hardware-software partitioning problem of embedded multiprocessor FPGA systems. They introduced hardware-oriented partitioning (HOP) [7] that initially implements all hardware functionality, and reallocates some functions for software implementation to improve partitioning results. Moreover, enhancement partitioning [8] was demonstrated with comprising fitting-system constraints and hardware-oriented partitioning. In 2008, efficient partitioning [9] applied a GA and hardware-oriented partitioning to achieve a partitioning result. In term of exhaustive approach, sophisticated computed partitioning [10] used two-levels partitioning comprising the processors-fit level and multi-fit level constraints. With generating code issue, Dossis [11] describes an approach to automatically produce non-standard, custom and directly implementable hardware description code for intelligent web service. On the other hand, Fan et al. [12] propose a retargetable code generation (RCG) methodology which can generate retargetable C program for embedded processors.
Preliminary work
A task is an atomic unit in an embedded multiprocessor FPGA system. Each task can be implemented as a hardware component or software procedure when developing an embedded multiprocessor FPGA system. In other words, each task has two types of implementation. The hardware-software partitioning technique becomes a decision system that determines each task to be implemented as hardware or software. Thus, an embedded multiprocessor FPGA system consists of a set of tasks that will be partitioned into two sets of hardware, set H and one software set, set S, after hardware-software partitioning.
This work developed a partitioning tool shown in Figure 1 -2 which based on the Kernighan and Lin [13] algorithm. The partitioning tool can construct a bi-sectioned and balanced system of sets H and S. Additionally, the developed tool can reduce external cost via swapping a subset from H and with one from S. Figure 1 shows a case with 30 randomly generated weighted tasks (i.e., Nos. 0-29) comprised of H (left) and S (right) sets. This case has two costs: internal and external costs. An arc connects tasks within H or S; these arcs are called internal costs. External cost defines the cross connection between H and S tasks. This work discusses the un-weighted and weighted effect on internal and external costs in the developed tool. Figure 2 shows with one from S. Although the ability of the developed tool was extended from the un-weighted effect to the weighted effect, the tool does not consider the effect of relationship degree for each constraint during swapping procedure. 3. Grey relational hardware-software partitioning
Grey relational analysis
The grey system was developed by Deng [14] in 1989 to solve uncertain system problems. Equations (1)- (7) are typically applied in grey relational analysis when analyzing the grey relational effect. This work applies grey relational analysis to achieve grey relational hardware-software partitioning. 1) An embedded multiprocessor FPGA system can be partitioned into a set of tasks mapped to a set of X i (k), where i is a hardware and software task and k is a set of constraints. Therefore, an oriented dataset, X i (k), is generated by Eq. (1). 2) System constraints are set to X 0 (k), which is shown in Eq. (2.1). 3) Hardware set, H, and software set, S, which are shown in Eqs. (2.2) and (2.3), respectively, can be distinguished from X i (k). Hardware set H is set to X 1 hw (k), X 2 hw (k), …, X m hw (k), and software set S is set to X 1 sw (k), X 2 sw (k), …, X m sw (k). 4) According to Eq. (3), X i hw (k) or X i sw (k) becomes a candidate task, as task e candidate and task p candidate are determined by the largest number for computing γ(X 0 (k), X i hw (k)) and γ(X 0 (k), X i sw (k)). Thus, each task can be identified and implemented as hardware or software. This procedure is repeated until X m hw (k) or X m sw (k) is derived. Therefore, two partitioning results, comprising a set with task e candidate and task p candidate , are generated. 5) A grey relational hardware-software partitioning result can be achieved by calculating the largest number of γ(X 0 , X i ) for the two partitioning results derived by Eq. (7). 
An example of power consumption during grey relational analysis in hardware-software partitioning is as follows. Suppose X i is 181.35; X j hw and X j sw are then 0.383 and 1.067, respectively. The Δ ij of X j hw and X j sw can be derived separately as 180.97 and 180.28 by Eq. (4). Next, Δmin and Δmax, as shown in Eqs. (5) and (6), can be computed as 0.0027 and 2.92 from a dataset. The distinguishing coefficient δ is typically set at 0.5. After these parameters of grey relational coefficients are calculated, γ hw (X i, X j hw ) and γ sw (X i, X j sw ) are derived as 0.0161 and 0.0162, as shown in Eq. (3). Thus, the candidate of task is to be implemented by software called task p candidate as γ sw (X i, X j sw ) is larger than γ hw (X i, X j hw ).
Grey relational partitioning algorithm
The grey relational partitioning algorithm, GreyRelationalHW-SWPartition() (Fig. 3) , can obtain a partitioning result with low power consumption and rapid execution time. The algorithm requires five input elements-X 0 , X e hw , X p hw , X e sw and X p sw . System constraints are set to X 0 . X e hw and X p hw , which represent execution time and power consumption of the hardware task, respectively. Additionally, X e sw and X p sw are execution time and power consumption of the software task, respectively. Each task will be implemented as hardware or software after computing the grey relational value of γ e hw , γ e sw , γ p hw and γ p sw for identifying candidate tasks. After all candidate tasks are identified, two partitioning results, task e candidate and task p candidate , are generated. Finally, grey relational analysis is adopted to obtain a partition result. Figure 3 shows the proposed GreyRelationalHW-SWPartiton() algorithm for grey relational hardware-software partitioning. The running time of GreyRelationalHW-SWPartition() is O(n) for labels 1-12. Label 13, O(mlogm), is used to find the maximum grey relational value for task e candidate or task p candidate . Therefore, the time complexity of GreyRelationalHW-SWPartition() is O(n×mlogm). 
Experimental results
This experiment makes eight assumptions for focusing on the discussion of grey relational hardware-software partitioning. First, we assume any embedded multiprocessor FPGA system can be divided into a set of tasks that can be modeled by a task graph. Second, all tasks can be implemented by hardware and software. Third, the constraints of power consumption and execution time for each hardware task and software task can be measured. Fourth, hardware tasks consume more power consumption than software tasks. Fifth, hardware tasks have faster execution time than software tasks. Sixth, implementing a hardware task only requires slices of FPGA, as opposed to software tasks the merely needs memory of FPGA. Seventh, interface cost is zero. Finally, every processor in a multiprocessor has the same characteristics implying that each processor has same computation for each software task.
This work uses a 199-tasks benchmark [15] to demonstrate the feasibility of the proposed scheme. As a well-defined benchmark is not supplied by academia or industry, Purnaprajna [15] et al. presented a case with 199 tasks and a task graph generated randomly to simulate an embedded system. This study implements a graphical user interface that generates a task graph randomly (Figure 4) . The evaluation parameters include execution time and power consumption. The execution time parameters for hardware and software tasks, 0-4 and 0-30, respectively, were generated randomly. The powerconsumption parameter for hardware and software tasks was generated randomly, and was 0-3 and 0-1. The distinguishing coefficient δ of grey relational coefficient is typically set to 0.5. This experiment was performed five times (column 1 in Table 1 ). Columns 2 and 3 in Table 1 lists experimental results obtained using the proposed method. Columns 4 and 5 in Table 1 show the case 1 results of obtained by Purnaprajna [15] . Columns 2 and 4 in Table 1 show comparison results for power consumption, indicating that the proposed method has low power dissipation for all cases. In terms of execution time 
Conclusions
Hardware-software partitioning is used to solve a partitioning problem for embedded multiprocessor FPGA systems. This work applies grey relational hardware-software partitioning to obtain a partitioning result. Moreover, grey relational hardware-software partitioning algorithm with a time complexity of O(n×mlogm) is presented. Specifically, the proposed method achieves a partitioning result with low power consumption and fast execution time via grey relational analysis. Experimental results of benchmark with a 199 tasks demonstrates the feasibility of the proposed approach for hardware-software partitioning in embedded multiprocessor FPGA systems.
