ABSTRACT Hardware/software (HW/SW) partitioning and scheduling are the crucial steps in HW/SW co-design. They have a strong effect on performance, area, power and the system itself. In this paper, a memory-reinforced tabu search algorithm with critical path awareness (MTSP) is proposed for solving the HW/SW partitioning problem. First, the critical path (CP) algorithm can locate the critical task queues and output a reduced task graph. Second, the solution to a heuristic algorithm (HA) is used as the initial solution. Third, by introducing hash technology, adding dual memory tables improves the search strength and effectiveness of the tabu search, and the experiment is completed by priority scheduling. MTSP especially has good performance in large task graphs, while it can greatly improve system performance, especially in the case of generating a large communication penalty. The experimental results show that the average improvement over the latest efficient hybrid algorithm is up to 5%. The improvement in algorithm searching time is 66% in comparison to the popular algorithms cited in this paper.
I. INTRODUCTION
As the density of transistors increases, multiple processors system on chip (MPSoC) came into being in order to combat the power wall: with increased processor clock speeds for faster performance came increased power (and heat) output [1] - [3] . System on chip (SoC) platforms composed of microprocessors and FPGAs are called reconfigurable MPSoCs [4] , which ensure flexibility and better design parameters. The general-purpose processor that implements software computing and the intellectual property (IP) core that implements hardware computing are collectively referred to as computing resources.
The traditional performance improvement method for MPSoC computing resources is the optimization of task partitioning [5] , [6] and scheduling algorithms [7] , [8] . However, this method has obvious limitations in two aspects. First, the scheduling algorithm itself has a very limited acceleration for task operation, but its overhead cannot be ignored.
The associate editor coordinating the review of this article and approving it for publication was Khursheed Aurangzeb.
Second, the type and quantity of computing resources and the scheduling algorithm affect each other. Combinatorial optimization can obtain the optimal solution. Therefore, the hardware configuration of multicore platforms has caused extensive research.
Efficient techniques for HW/SW co-design [9] - [11] are necessary to realize embedded systems that must meet design constraints while satisfying the shorter time-to-market pressures [12] . HW/SW partitioning [9] , [13] , [14] is the crucial step during HW/SW co-design, and the HW/SW partitioning algorithm determines which components are implemented in hardware and which components are implemented in software [15] . First, it can guide the design and configuration of computing resources, reducing the overall power to achieve regional optimization; second, the system can be optimized to obtain the maximum acceleration.
HW/SW partitioning is based on Amdahl's law [16] . Twenty percent of the code consumes 80% of the time cost. By performing a small number of tasks in parallel, it combines other overhead costs (communication and memory I/O) analysis to benefit or not. In recent years, much research has been performed on HW/SW partitioning, which can be divided into structure partitioning and functional partitioning [13] , [17] , [18] . Structure partitioning has more blocks, usually using functional partitioning.
The HW/SW partitioning algorithm can optimize multiple targets, such as minimizing power consumption [19] , [20] , hardware area [21] , [22] , and increasing the acceleration ratio [23] - [25] . Arato et al. categorized HW/SW partitioning problems into two types in [26] : a small part can be solved in polynomial time, usually using dynamic programming [27] , integer linear programming [28] , and accurate algorithms such as branch and bound [29] . Most of the remaining partitioning problems are NP-hard [30] . For NP-hard or larger partitioning problems, heuristic algorithms are the main topic of research.
Traditional heuristic algorithms include genetic algorithm (GA) [31] , simulated annealing (SA) [32] , and tabu search (TS) [33] . In [34] , the comparison of TS, SA and GA algorithms proves the advantages of TS. There are also many hybridizations of heuristic algorithms, such as the greedy simulated annealing (GSA) algorithm combining the greedy and SA algorithms [35] , the modified GA algorithm with an efficient crossover operator [36] and the re-excited particle swarm optimization (PSO) algorithm [37] , [38] proposed an efficient heuristic algorithm refined by the TS algorithm based on the multiple-choice knapsack problem (MCKP). These hybrid algorithms all have good performance and are mostly used to optimize performance and reserve an extended area to determine the global optimal solution as much as possible.
These methods focus on optimizing the algorithm itself without considering the combination with the specific target platform. Given the differences in the collaborative environment and the lack of a common standard [39] , the results obtained cannot be compared with others. This paper focuses on the algorithms applied to reconfigurable MPSoC [40] . Because it is difficult to determine the impact of performance metrics for hardware and software partitioning, this problem requires the introduction of a multiobjective optimization model [41] ; Wang et al. [14] used an uncertain model and analyzed reconfigurable HW/SW partitioning issues. This paper presents the MTSP algorithm on reconfigurable MPSoC for the HW/SW partitioning problem. First, the critical path algorithm is proposed. By locating the critical task queues and configuring the crossbar to reduce the communication penalty. Second, because the quality of the solution is significantly reduced when the TS algorithm searches a large task graph, the HA algorithm is introduced to provide an initial solution to improve the search efficiency. Hash technology adds a dual memory table to improve the search intensity, and the solution can match the well architecture of the reconfigurable MPSoC. Finally, the experiments verify the effectiveness of MTSP.
The rest of the paper is organized as follows. In Sect. 2, we introduce some definitions and the architecture utilized in this paper. In Sect. 3, we present the MTSP algorithm for HW/SW partitioning. In Sect. 4, we verify the effectiveness of our algorithm through experiments. Our discussions and performance comparisons are presented. In the final section, we conclude our research and analysis. 
II. PRELIMINARIES A. ARCHITECTURE OF RECONFIGURABLE MPSoCs
The target reconfigurable MPSoC architecture in this paper is illustrated in Figure 1 . The algorithm can be compared with existing algorithms in a fair environment [14] without changing the reference architecture. The processing elements (PEs) include the processors and the hardware IP core. The system is built on programmable logic devices such as Xilinx FPGAs, such that the number of PEs for each type is configurable. The processors are isomorphic, and each processor can execute only one type of software task per unit time; each hardware IP core is heterogeneous, and each hardware computing unit is encapsulated into an IP core format and can perform only one hardware task belonging to a specific type of task set each time. In the platform to which the method is applied, in order to enable as many kinds of computing tasks as possible and provide acceleration, there is only one processor along with a plurality of hardware computing units. At the same time, in order to minimize the on-chip area consumed by the hardware platform, there is only one type of hardware IP core of each task type. In an attempt to meet the communication requirements between the processors and the FPGA, the two are connected by a crossbar switch to realize parallel execution of HW/SW tasks. Since sharing the same communication channel is bound to bring time loss, we use latency to quantify this penalty.
B. TASK GRAPH MODEL
Formally, the application to be partitioned is represented as task graph: a directed acyclic graph (DAG) where V is the set of nodes and E is the set of edges. Each node in V represents a task v. The execution time of each task v ∈ V in software is s v , displayed in the upper left corner of the corresponding node in G, and the hardware execution time is h v , displayed VOLUME 7, 2019 in the lower left corner of the corresponding node in G. Area penalty is displayed below the nodes of the task graph and the value on the edge represents the communication penalty. The dependencies between tasks are represented by edges in the task graph; for example, edge (u, v) indicates that task v depends on the execution of task u; therefore, u is called the predecessor task of v, and v is the successor task of u. The predecessor set of task v is P (v), and the successor set is S (v). The important notation is summarized in Table 1 (Table 1) . For task graphs, if some tasks have no predecessors, these tasks are called start tasks; similarly, if some tasks have no successors, they are called termination tasks. In the DAG, if there are multiple start nodes, V start is added as a predecessor of all the start nodes. Similarly, when there are multiple termination nodes, adding V end after all the termination nodes ensures that the DAG has a unique start node and termination node, V start and V end execution time are 0, and the communication overhead with the task graph is cost free.
Simultaneously, [42] defines the granularity g(G) of the task graph G:
By definition, when g (G) ≥ 1, G is called coarse-grained task graph, all task graphs of this paper are coarse grain, and it is ideal attribute of task graphs.
We add some qualifications to the execution of the task graph on the target platform: 1) Each task in the task graph can be completed only by a specific PE, and the PE is selected according to the HW/SW partitioning algorithm.
2) The software PE executes one software task at a time, and the hardware PE can execute multiple tasks in parallel according to the area cost limit.
3) Once the task starts execution, it cannot be interrupted.
4) The high-speed and low-latency transmission can be realized by the configured crossbar switch, which can be regarded as cost free.
5) Communication time is the total time including read, write and so on.
Given task u, v, if task u ∈ P(v) and task v is a software task, before task u be executed on software PE, P(v) must have been completed. In order to check the status of the predecessor task, c(P(v), v) is defined as the sum of the
P(v) communication penalties in this case. In the other case, if v is implemented on hardware PEs, then c(P(v), v) is calculated by maximizing the communication cost. Formally, c(P(v), v) is defined by:
We introduce the concept of Pref in [43] to evaluate the performance of the system:
T soft is the total execution time of the software task, T hard is the total execution time of the hardware task, and Pena is the sum of the communication cost in the execution of all tasks. As shown in Figure 2 , a task graph G with three nodes. After reading the task graph, removing the communication cost by adding it into the execution time of the task, considering tasks execution on software PEs are sequential communication with other tasks, while hardware PEs execute tasks can communicate concurrently. Accordingly, we can simplify the task graph to obtain G , where s v ' and h v replace s v and h v in G, respectively:
Obviously, the new task graph G' retains all the features of the task graph, but the quantity of data is smaller, which is beneficial to the calculation of our partitioning algorithm.
For the execution of the task graph, we make the following assumptions:
1) The condition under which any task v i can start execution is that
, and the data ready time of the task vi is when the data of the predecessor task has been transmitted to the task node.
2) Tasks assigned to hardware can be executed in parallel.
3) Tasks assigned to the software need to be executed sequentially, depending on the task priority.
4) The priority of the software task: First, check level; the task of the previous level in the task graph is executed before the next level of the task can be started. Second, check num successor for tasks at the same level. The higher the number of subsequent nodes, the higher the priority. Third, check s vi ; tasks with short software execution time are preferred.
C. PROBLEM P AND RELATED METHODS
The resolution of a problem requires the definition of a model representing all the important issues related to the specific problem [39] . In this paper, the input of the HW/SW partitioning algorithm is G , the task nodes are mapped to the hardware and software PEs, and the algorithm output is represented by x i , where x i ∈ (0, 1),
. . , v n ) represents the execution time of the computing task, and the area cost A(a 1 , a 2 , . . . , a n ) of the PEs provided to the corresponding task. The HW/SW partitioning problem discussed in this paper can be formulated as the following nonlinear minimization problem:
For a given area cost A, the partitioning result with the least task execution time is solved. The area cost of the task execution on software PE is negligible. For the convenience of this discussion, we regard it as cost free.
In the work of predecessors, the TS algorithm has proved that it is the optimal algorithm to solve the problem [34] . Therefore, there are many improved algorithms based on the TS algorithm, such as tabu search simulated annealing (TSSA) [43] . By generating the neighbors through the idea of SA, it can be accepted with a certain probability. The difference solution can effectively avoid the local optimal solution. TSSA has strong mountain climbing ability and combines the advantages of the two algorithms. The genetic algorithm tabu search (GATS) [44] provides the main framework of GATS and uses TS as the mutation operator. TS to search the entire solution space, so it has a strong ability to climb mountains while providing memory capabilities by tabu tables. Both algorithms have a certain degree of improvement over the TS algorithm.
In summary, there was no single approach of absolute advantage in both runtime and solution quality for problem P. Therefore, this manuscript proposes a novel approach to solve problem P.
--A novel critical path algorithm is proposed based on configurable crossbars to ensure that system performance greatly improves, even in the case of generation large communication costs.
--In particular, we combine the advantages of accurate algorithms and heuristic algorithms. A specific solution is obtained by an accurate algorithm, and the solution is used as an initial solution in the input TS algorithm, and the global optimal solution can be accurately searched with fewer iterations.
--Furthermore, our memory-reinforced TS algorithm maintains a dual memory table. In this way, our strategy can enhance the search directivity and eventually improve the solution quality.
Experimental results show that the algorithm is effective.
III. ALGORITHM FOR PARTITIONING A. OVERVIEW OF PROPOSED MTSP ALGORITHM
In our method, three algorithms are executed sequentially to output the partitioning result. The input of MTSP is task graph G, and the critical path task graph (CG) is output through the critical path algorithm. As a local domain search algorithm, a good initial solution can effectively enhance the quality of the final solution of TS. The solution of the heuristic algorithm as the initial solution of the TS algorithm finally executes the TS algorithm to obtain the partitioning solution. An overview of the proposed MTSP algorithm is shown in Figure 3 . 
B. CRITICAL PATH (CP) ALGORITHM
In our previous work [45] , high-bandwidth, nonblocking, low-latency transmissions were achieved with configurable crossbars. We selected some of the PEs that performed the same task queue. Such a task queue is called a task chain (TC), and the task chain is satisfied:
Tasks within the TC, providing a communication interface through the Xilinx Fast Simplex Link (FSL) bus in a configurable crossbar that allows Microblaze to directly access the FIFO. We treat the interconnection between tasks in the same TC as a type of local communication with no time cost.
To locate the task chain, we use the critical path algorithm to traverse all the task queues from V start to V end and set them in descending order to the communication cost. Selected TC by the formula below:
Task node N and edge E will be deleted in the task graph when they add to TC. The above steps will be executed VOLUME 7, 2019 iteratively as soon as there is no path from V start to V end . Furthermore, traversing all the edges with nodes remaining in the task graph as the start and end nodes, the CP algorithm will repeatedly execute formula 7 until there is no edge with the task node as the vertex. All TCs will be output and algorithm terminates. As shown in Figure 4 , it is the task graph simplification of radar signal processing. First, we obtain 6 task queues from V start to V end , calculate the communication cost of each task path, and set them in descending order to the communication cost. Select TC 1 in task queue by formula (7), nodes V 1 , V 3 , V 6 and edges V 1 V 3 , V 3 V 6 will be deleted at the same time, repeat the above steps twice, only one task node V 5 remains in the task graph, and the algorithm terminates. Output: TC 1 = {V 1 → V 3 → V 6 } TC 2 = {V 2 → V 4 → V 7 }, and finally, we convert CG into CG' according to formula (4) . Considering that most of the task graphs have a large size, the whole algorithm maintains an Open list and a Close list.
The TC can be accurately found by algorithm1. TC will be indicated in CG, interconnection cost will be reduced by (2) and output CG', the time complexity is O[(m + n) log n], m and n represents the number of edges and nodes in the task graph.
C. HEURISTIC ALGOTITHM (HA)
The proposed algorithm utilizes the idea in solving the 0-1 knapsack problem. We initialize the solution of problem P: [0, 0 . . . , 0], all tasks are executed by software PEs, and the tasks are gradually transferred to be executed on hardware PEs through HA. Formally, let b v denote the benefit of moving the task v to hardware [46] , and T v denote the set of software tasks (including v) lying in the same precedence level of the software task v.
The area penalty of task executed on hardware PEs is different, and the total area penalty A of reconfigurable MPSoC is limited; efficiency e v is defined and calculated as follows:
After observation of the CG', the benefit b v corresponds to the profit in the knapsack problem. Obviously, the speed of the key nodes has a great impact on the task execution time. Therefore, task v i with the highest e vi value should be assigned to the hardware PE first.
The time complexity of HA is O(2n). TS is a heuristic local search algorithm proposed by Glover and Laguna [47] . The basic idea is to add the history of recent search movements to the tabu list during the search process to prevent the loop of the search process. This paper proposes a memory-reinforced tabu search algorithm applied to the partitioning problem P. A good initial solution can effectively improve the quality of the final solution of the TS algorithm, we use the solution generated in HA as the initial solution of MTS, and MTS maintains a tabu list with a maximum length of l that stores the forbidden movement. Considering that the partition solution in P is a sequence containing only 0 and 1, the movement S c can be formulated as (11), let S local denote solution for the current iteration, and S local denote the solution for last iteration.
The tabu list can prohibit l types of movement in the current iteration, and the tabu list is the FIFO queue. Whenever a new value is stored, the earliest stored value will be released. Tabu degree (TD) information will be saved according to the number of times added to the tabu list. The neighborhood search strategy divides the movement {S c } into three categories: not prohibiting {U }, prohibited but beneficial {FB}, prohibited but not beneficial {FN}, when {U } ∪ {FB} = ∅, next iteration of moving S c ∈ {U } ∪ {FB}.
S c ∈ {FB} denotes prohibited solution has a good performance, it can 'break the law' to accept this movement, S c ∈ {FN } denotes all solution after moving within the neighborhood is prohibited and the performance is poor, we select the movement with smallest tabu degree as the S c of the next iteration.
Considering that the tabu table saves only a short-term memory of the movement, search around the optimal solution neighborhood in the previous iteration. iter MAX When a solution is searched and the INT KEY is the same as the value stored in the hash table, it is considered a conflict. The conflict resolution problem is a necessary prerequisite for the effective use of hash technology. We evaluate it through problem P. The better solution will be determined to be saved as corresponding INT KEY . The movement with no improvement will be added to the tabu table.
In the process of MTS, S local will be updated only when the performance of S local is better than the S best for the given condition. Movement for each time is two bits at random, i.e., i=n i=1 S ci = 2, where n is the task number of problem P. When the number of iterations reaches N times and still fails to update S best , the search will be adjusted to i=n i=1 S ci = 1. This process will be executed only once, the termination criterion is that the number of whole search iterations reaches M times, or when all tasks are transferred to hardware execution, the output S best is the final solution the algorithm.
IV. RESULTS AND DISCUSSION
The proposed algorithms were simulated in C on an Intel(R) Core(TM) CPU@ 2.00 GHz processor with 8G memory. In order to make a fair comparison with TSSA and GATS, considering that these algorithms are heuristics rather than exact algorithms, a general test benchmark is necessary, and our implementations are based on the same type of task graph as used in [21] . All experiment is completed by our previous work priority scheduling [48] . The task graph used in this paper is uniform random generation of common structures by task graph for free (TGFF). The number of task nodes is configurable, and task topology includes in-tree, out-tree, fork-joint, mean-value analysis and fast Fourier transform (FFT) task graphs, as shown in Figure 5 . The task graph generation parameters are shown in Table 2 : The communication cost can be classified into two cases, each of which follows a uniform random distribution in its interval. This manuscript introduces the concept of the communication-to-computation ratio (CCR). The value range of (1) is the computationally intensive task, corresponding to CCR_L. The value range of (2) is the calculation communication equalization task corresponding to CCR_H. The performance of the algorithm is fully compared for two different CCR test environments.
Assume that the area of hardware PEs required by task i is a i , let A be the total area for all tasks to be executed by hardware in problem P, A = α * a i . in consideration of the given PEs are limited. α denotes the hardware coefficient by adjusting the coefficient α ∈ [0%, 100%] to simulate the case of allocating tasks on hardware PEs in an actual application, and the relevant parameters of the MTSP algorithm are shown in Table 3 . To measure the effectiveness of the algorithm, we define two metrics, the acceleration ratio AR and the improvement degree ImpD. We assume that the initial search task is executed on the software. We define sol[0, 0 . . . , 0] as the initial solution, while it has the maximum running time of the task.
Here, ImpD is the improvement of algorithm A over algorithm B:
Let θ 1 , θ 2 be the two types of ImpD in our experiments, θ 1 denotes the improvement of the algorithm solution execution time, θ 2 denotes the improvement of the algorithm search time.
First, we investigated the effect of HA on the quality of the MTS solution. HA+MTS denotes MTS starting with the heuristic solution of the HA, RD+MTS denotes the trivial solution chosen as the initial solution of the MTS. The trivial solution is generated by randomly generating a solution for problem P under the limited conditions of area A. HA, RD+MTS and HA+MTS are compared in Figure 6 , α = 50%, and the input task graph does not consider communication cost: c(u, v) = 0, ∀(u, v) ∈ G, as the number of tasks increases, the AR of RD+MTS decreases, from 1.30 to 1.10, HA+MTS always approaches the optimal solution, and the solution of the HA is in the range [1.18, 1.26]. In conclusion, HA+MTS is the best among the three algorithms.
As shown in Figure 7 , by adjusting the number of nodes in the task graph, θ 1 of the MTS with CP algorithm (CP+MTS) over MTS with different α, Figure 7a CCR_L is computationally intensive tasks condition, as increases with the number of tasks, θ 1 shows a downward trend when α = 0.1 and α = 0.3, and the overall decline is up to 66.08%. For α = 0.5 and α = 0.7, the ImpD is relatively stable. In the case of α = 0.7 and N = 400, θ 1 = 2.23%. For CCR_L, when α is small, the influence of communication penalty on the AR decreases; however, for larger α, the hardware PEs are sufficient, θ 1 is more stable. As shown in Figure 7b , θ 1 is positively correlated with α, with the increase in α, θ 1 increases significantly on average. In this comparison, the CP+MTS algorithm has obvious advantages and is more stable at CRR-H. In summary, CP+MTS is clearly superior to MTS.
The task graph has different topology, and at the same time, the ratio of nodes to edges is different. It also affects the CP algorithm. The performance of the CP algorithm is analyzed for the task graphs of five different types of topologies. Figure 8 shows the improvements of CP+MTS over MTS for all types of task graphs considered. The communication penalty is set to CCR_H, task number N=100. With increases in α, θ 1 also increases gradually, the performance in various topological is quite different, the average value of θ 1 for out-tree is 16%, and for in-tree is 12.2%. The average number of successor nodes impacts the performance of CP+MTS, which is obviously reflected in the condition when α ∈ [0. 3, 0.7] . In other words, CP+MTS performs better when the node-to-edge ratio is larger. Figure 9 shows the comparisons of solution quality produced by HA, TSSA, GATS and MTSP for the task graph with 200 nodes. When the value of hardware area in percentage α exceeds 80%, considering the solution quality of TS is good enough and it's hard to make great improvement. MTSP and TSSA keep positive returns, MTSP has the largest θ 1 in the range of α ∈ [0, 0.3], and hybrid algorithms all perform well in the range of α ∈ [0.35, 0.55]. The average θ 1 of MTSP is 13% larger than TSSA (7%) and GATS (7%).
In the comparisons of searching time for algorithms, HA has the fastest searching time, less than 0.02 seconds; however, MTSP, TSSA, especially GATS, suffers from higher search time, and MTSP's search time tends to decrease with increasing α. The average θ 2 of MTSP over TSSA and GATS are 53% and 66%. Because MTSP maintains a dual memory table is more efficient in search, at the same time, when the solution does not improve over a period of time, the search is terminated. Therefore, the optimal solution can be obtained faster; however, GATS is based on GA, and the search speed is slower by maintaining a population to iterate. In summary, MTSP is clearly superior to the other approaches. As shown in Figure 10 , by changing the hardware area in percentage α and the number of tasks N, we record the iteration number MTSP arriving at the optimal solution for the task graphs with 100 nodes randomly generated. From Figure 10(a) , the majority of cases obtain the best solution within 10 iterations; only 5% of the solution is obtained by more than 10 iterations but less than 140 iterations when the hardware area is 60%. According to the nonimprovement threshold, the final number of iterations of MTSP will not exceed 350. Figure 10b and 10c show the case of α = 70% and α = 80%, respectively. The variance of the iteration number gradually increases, and the maximum value of the iteration number increases from 458 to 748. For Figure 10b , only 10% of the iteration number is obtained in the range of [100, 500]. However, for Figure 10c , the iteration number in the range of [100, 800] is 24%. This is because with the increase in α; the quality of the solution obtained by the HA algorithm decreases gradually; hence, MTSP requires more iterations to obtain the optimal solution. For task graphs with more than 200 nodes, we enlarge the length of the tabu list and neighborhood size to 50. As shown in Figure 11 , the available hardware area is set to α = 80%, and as the number of task nodes increases, the variance of the iteration number value gradually increases. When N=200, 61% of the optimal solution iteration number is less than 100. When N increased to 600, only 40% of the optimal solution iteration number was less than 150.
Considering that the quality of the initial solution is degraded with the increase in task node number, thanks to the dual memory table, MTSP can still obtain the optimal solution through a large number of iterations, while MTSP eliminates the search restriction [21] caused by the memory limitation in the large-scale task graph of the TS algorithm. When the task number=600, the maximum number of iterations is 1431; therefore, a maximum of 2000 iterations is quite enough to locate the optimal solution when N≤600. Because of the nonimprovement threshold, the invalid search time is reduced. Figures 10 and 11 show that with increasing α and N, MTSP can locate the optimal solution for large-sized problems.
V. CONCLUSIONS AND FUTURE WORK
In this paper, we presented a memory-reinforced tabu search algorithm with critical path awareness for HW/SW partitioning on reconfigurable MPSoCs. The contributions of this work to field are as follows. First, we simplify the input task graph by reducing the quantity of data and retaining the complete information to improve processing efficiency. Second, through the critical path algorithm, the crossbar is configured according to the output task chain, and the communication penalty of the task graph is reduced. Third, the solution of an HA algorithm is used as the initial solution of the TS algorithm, which caters to the starting-pointsensitive requirements of TS search, significantly improving search efficiency. Fourth, by introducing hash technology, adding dual memory tables improves the search strength and effectiveness of the algorithm. The above technical details and strategies have fully utilized the system characteristics of MPSoCs, greatly reducing the runtime and significantly improving the quality of the solution.
Although numerous experiments confirm the effectiveness of MTSP, there are still many areas worthy of further research, such as making full use of partial reconstruction of FPGA and adopting effective methods to solve the dynamic partition problem; the proposed technique is designed for an MPSoC with a processor that executes only one task at a time, not allowing interruption. This kind of system unsuitable to support not only multithreaded software execution but also execution of the interrupt service routine (ISR). All these requirements in architecture lead us in future work to explore algorithms for HW/SW partitioning on different types of developed systems, combine this with more runtime task problems, etc. He has authored over 20 journal and conference publications. His research interests include network security, mimetic defense architecture, and hardware/software co-design.
