Abstract-In this paper, mapping algorithm has been mainly studied. The main work and contribution have been generalized as follows: Through the research of existing onchip network mapping algorithm and global optimization algorithm, a multi-step mapping algorithm for low-power consumption have been designed, which is combined with the task allocation and the task scheduling. Compared with the traditional mapping algorithm, the algorithm in this paper takes the factors of task scheduling and allocation into account, mapping algorithm has three steps: task scheduling, IP core mapping and data block mapping. The simulation results show that the mapping method in this paper can effectively reduce Network-on-Chip (NoC) power consumption.
according to its characteristics, studying for on-chip network topology is of great significance.
In this paper, we mainly studied the mapping algorithm. The main work and contribution are as follows: Based on the research of existing on-chip network mapping algorithm and global optimization algorithm, a multi-step mapping algorithm for low-power consumption was designed, which is combined with the task allocation and the task scheduling. Compared with the traditional mapping algorithm, the algorithm in this paper takes the factors of task scheduling and allocation into account, and the mapping algorithm has three steps: task scheduling, IP core mapping and data block mapping. The simulation results show that the mapping method in this paper can effectively reduce NoC power consumption.
II. DESCRIPTION OF THE PROBLEM
With the constant development of NoC technology, the NoC technology will be used in various fields, and the mapping [13] [14] [15] question of the NoC platform is a very important part in the design process of NoC for the application of different direction for working out the most good mapping, which will effectively reduce the power consumption and the cost of NoC. The on-chip network mapping is IP correspond to the blocks of the topology to meet certain restrictions, such as bandwidth limitations. The optimized mapping algorithm is a means to achieve the optimal mapping scheme. The mapping algorithm design consists of two components mapped to optimize the design of the objective function and the realization of the mapping problem. First, analyze the optimization objective function model in today's mapping algorithm, and by reading the relevant literatures this article will be roughly summarized as flows:
A. Power Optimization
NoC power consumption [16] [17] can be divided into the computing power consumption and the communication power consumption. Computing power consumption is the power consumption of the resource node in the execution of its mandate, and its value is generally determined for a specific application task, and the communication power consumption refers to the power consumption generated by the data communication between any resource nodes. Under the rules of architecture, communication energy formula was proposed in the literature [16] . The energy consumption formula is (1) when a filt through a router transfer. 
where, hops n represents the routing hops in the transmission process. We can conclude that power consumption is proportional to the routing hops from equation (3) . In the shortest route routing, i t to j t routing hops are the Manhattan distance between them.
B. Delay-oriented Optimization
NoC delay optimization is essentially divided into two parts: congestion related to waiting time and distance related to the inherent transmission time. Transmission a packet delay , ij T from the source node i t to the target node j t is defined as time period: i t produces the beginning at the first micro-chip, and the end of packet arrives j t end. ,
ij
T model is as shown in (4).
,, (
where, b T is a flit data by the time required for a switch and a link for no congestion, w T is the average waiting time of the congestion header at the switch node, , ij h is the Manhattan distance from i t to j t , B is the number of microchip flit of the packet. Where w T and B are constant coefficients. The first item is the destination time for head film in (4) . The second item is the destination time for successor microchip to the arrival time, and they are constant. Therefore, the optimization space is the first item. The parameter w T , , 
C. Load Balancing Optimization
Balance link load is an effective way to alleviate congestion and ease the network congestion, which will effectively reduce the average waiting time w T , therefore, to some extent, the establishment of the load balancing model will also be reduced the NoC delay. Link load imbalance indicators is said by link load variance defined as equation (5 
D. Area Optimization
Area optimization is discussed relatively less compared to other optimizations, and its main application is in the irregular 2Dmesh topology. 2Dmesh is seen as the standard cell in the structure, and each mapped IP core will cover one or a few cells, which is not allowed coverage between each IP. Therefore, how to display IP core makes the final total area of the smallest will be the indicators to measure the area of optimization algorithms. We can make use of dynamic optimization ideas in mathematical optimization to solve this problem.
We can see from the previous analysis for NoC mapping the basic idea is: First, analyse the NoC communication theory, then establish a rational optimization objective function model according to its principle, such as the above mentioned power-based optimization and the optimization based on delay. NoC will be done quantify settings according to the actual communication behavior, for example, set the value of the right to communicate between the IP core to the performance of the communication cost between the different IP, and any communication distance between two points according to the routing algorithm, in general, it is the Manhattan distance. Then on this basis, the objective function model previously is established by the global dynamic optimization methods to find the best mapping program.
Optimize the mapping considerations seemed more one-sided without taking into account the tasks to be performed in parallel optimization and pending the optimization of the block of data sharing. In the typical multiprocessor system (CMP), the design process includes two important steps: task allocation and task scheduling. In the given task graph, the design constraints (execution time and power consumption, etc.) and IP library (processing unit), first, each task is assigned to the appropriate processing unit, and this step is referred to as task allocation; then arrange each processing unit on the task execution order, this step is referred to as task scheduling. However, for the NoC design, we also need to add two steps: mapping and path allocation. This article will consider task scheduling and task allocation to the design of the mapping algorithm, which makes the optimization of the factors considered more comprehensive. The article will do the power optimization mapping in task scheduling, IP core mapping and data unit mapping.
III. PROPOSED SCHEME In this article, we mainly study the optimized mapping of the low power consumption NoC with 2Dmesh as a platform, in this platform, adjacent node communication is composed by the two-link, and set A, B as 2Dmesh in any two adjacent communication nodes, two links, respectively, are from A to B and from B to A, exchange modified wormhole routing and switching. In this model, each node is composed of the communications router module, IP core module responsible for processing data and used to store data in memory module (similar to the CPU cache).
The IP core computing behavior is abstracted for three kinds of activities by reading the relevant literature [18] [19] [20] . First, read the information: Node A will read the above information of Node B, and Node A will carry the relevant information and address information of Node B send packets to Node B, then Node B will send demand for information feedback to Node A. Second, the information is written: Node A wishes to update the information and then saves it to Node B, and node A will carry the packet updated information and destination address sent to Node B. Third, the information is processed: IP core does data processing according to the demand. Finally, the processed IP core task is abstracted as the sets of tasks: The data needing the IP processing will be abstracted for data collection.
A. Task Scheduling
A specific task can always be represented by some form of task graph (TG). The task scheduling is assigned to different virtual IP cores by TG specified tasks in accordance with certain rules, that is, each task is processed by when and which IP core. The main purpose of the task scheduling in this article is: 1) Minimize the IP idle time; 2) Reasonably assign tasks execution order; 3) Make the data used by the same IP core as much as an IP core tasks performed by the next call. The idea of task scheduling algorithm based on CMP [21] is improved, then describe the improved task scheduling algorithm: 
The task scheduling algorithm is given above, and T is the collection of tasks, so any task has i tT  , and g=(T, E) is the task graph (TG), where T is the set of the task, 12 ( , )
t t E T T    is the premise of the implementation of the task 2 t before 1 t is completed, () t  is the execution time of the task t. The design of this map assumes that IP calls and storage of memory for the node, and data calculation has zero delay (Reading and writing are negligible compared to the local behavior of nodes and between nodes). The data set of the task execution can be split into multiple data modules, and the data module is a collection of D to any data module d, and the function r (t, d) and w (t, d), respectively, are the number of reading and writing of the task t for the data module d. In this algorithm, any virtual processor p, defines the two functions Q (v) and D (v), and Q (v) is the virtual processors p that can be called at that time, and D (v) is the data module stored in the p moment, and B (t) is the task t needing to call the collection of data module d, and its function is as follows:
B. IP Nuclear Mapping
The mapping of the second phase is the IP core mapping, and the main work of the mapping of this stage is to map the virtual IP core to the physical IP core, and the optimization objective of this stage is as far as possible the same data call between IP cores, which makes the distance closer. To achieve this, the objective function F is introduced:
where v is the set of virtual IP, ( 1) v  is the physical IP of the virtual IPv1 after mapping, ( 1, 2) C v v is the correlation function between the virtual IP, which is expressed as:
In the formula (12), g (v, d) is the sum of reading consumption of the virtual IP for the data module d in the task, the following expression is as follows:
Reading the information needs the round-trip path 2r(t, d), and () v  is the task set for the virtual IP core v after the assigned task scheduling. In the formula (3),
) vv

 
is the shortest distance of the physical IP core mapping on the 2Dmesh, that is:
In the formula (11), () v  represents a mapping program from the virtual IP to the physical IP, this stage of the mapping will be adopted in the ant colony algorithm for global dynamic optimization problem. The initialization process of the parameters in the ant colony algorithm is listed, and the parameters are as defined in Table 1.   TABLE I. ANT 2) Select a nuclear assigned to 2 r in accordance with the probability 2, () In a loop, each ant must be executed once for the above process to get the number of ants mapping solution. With the advance of the evolutionary process, the left information elements are gradually disappeared. The parameter  is the degree of attenuation of the pheromone [24] . When all ants complete one cycle, update the pheromone. The mathematical model of the pheromone updating is as follows:
where , ( , 1) ij tt   is the information gain from t to t +1 expressed as:
, ( , 1) program, we can see that the optimal solution is the minimum cost, which has the largest contribution to the amount of information update.
C. Data Unit Mapping
The third stage is the data unit mapping [22] , and the mapping task called data units are mapped to the physical memory of IP core unit in this stage, and the cost function of the mapping is as follows:
In the above formula () dV 
, which can implement the recursive.
IV. SIMULATION RESULTS
Simulation topology is mm  2Dmesh structure, and the power function model for power optimizationoriented mapping summarized in the beginning of this chapter is:
flit flit flit S L E E E  (22) Under this model, the energy consumption of i t sending a micro-chip data j t is:
which hops n represents the routing hops in the reign of the transmission process. Power consumption is proportional to number of hops in routing drawn from (15) . This article will be the tasks of the total number of hops as an indicator to judge the quality of the degree of optimization.
A. Simulation Setup
First set the virtual task block, virtual block consists of four cycles, namely [25] [26] In the above tables, tT  is a program segment of the implementation of the operation block, and d is the data processing module by the matched t, and m is the total number of the tasks in Loop1 associated with Loop2, Loop3 and Loop4, and n is the number of the smallest unit of data modules. The different tasks ti will call different data modules, and the total number of tasks is m in Table 2 , and each task corresponds to 10 units of the data module, for example, task t1 corresponds to 10 data modules for the 0-9 number, task t2 corresponds to the next 10 consecutive data modules for the of 10-19 number, and so on until the date the task tm. In Loop1 task, its task behavior is designed to perform a writing operation corresponding data module. Table 3 and Table 4 , respectively are the relations of the tasks and data modules of Loop2 and Loop3, and k is an even increments from 1 to m. Described by Table 3 as an example, data in Table 3 of the tasks associated with the number of the module is numbered subset of the data module task execution Loop1, and the data module in Loop2 is the data module after Loop1 operation, which shows that the tasks in Table 3 and Table 2 have the logical relationship, and the execution premise in the tasks of Loop2 must be completed in Loop1. In Loop2 task, the task of behavior designed for the implementation of the corresponding data module does reading and writing operations. Table 4 and Table 3 are consistent in terms of ideology, and the data processing module unit is different, and the data processing module unit number is an even number of units in Table 3 , and the data processing module unit number is an odd number of units in Table 4 . Table 5 shows the task number of Loop4 and data module, and k is an even increments from 1 to m. Data in the loop4 module is a set of the data Loop2 and Loop3 module, so the tasks in Table 5, Table 3 and Table 4 have the logic relations, and the premise of Loop4 execution is Loop3 and Loop2 execution completed. Its mandate behavior is designed to read and write operation on the implementation of the corresponding data module unit.
The next simulation needs establish a similar simulation task list as shown from Table 2 to Table 5 , and it needs take the appropriate m value, and then construct a table. The simulation results for the assessment of system power and the simulation data-processing tasks are the sum of n * Manhattan distance.
B. Simulation Analysis
The diagram in Fig. 1 (a) from left to right is threegroup comparison of the data for m = 4, m = 16, m = 36, and the corresponding topology is 2Dmesh, and their sizes are: 2 * 2, 4 * 4 and 6 * 6.
The diagram in Fig. 1 (b) from left to right is threegroup comparison of the data for m = 6, m = 36, m = 56, and the corresponding topology is 2Dmesh, and their sizes are: 4 * 4, 8 *8 and 12 * 12.
In Fig. 1 , the pure black column uses all three stages of mapping energy consumption, and the pattern column uses only the first-order and the second-order mapping energy consumption, and the white column uses only the first-order mapping energy consumption. In Fig. 1 (a) , the number of IP cores is the same as the actual design of the number of IP cores in the topology, and each I node has a corresponding IP core in its 2Dmesh topology, and in this case, compare the mapping performance by the application of a different stage of mapping, which can effectively reduce the power consumption of the system and the IP core mapping data unit mapping from columnar comparison chart. And with the increase of task number and topology of scale, optimizing the effect is gradually improved. The available nodes in the topology in Fig. 1 (b) are greater than the number of IP design, and optimizing the mapping is becoming increasingly apparent in this case. When m = 36, the topology scale is 8 * 8 comparing with the topology scale of 6 * 6, and its effect is more pronounced. That is, when the available number of nodes in the topology is greater than the number of the IP design, the effect of the optimization mapping is more pronounced. Fig. 1 (b) shows that the overall trend is a gradual increase with the increase of the topology optimization results and the increase in the number of tasks.
The total analysis shows that the low-power optimization mapping algorithm proposed in this paper can effectively reduce the NoC system power consumption to reach the design expectations.
V. CONCLUSION
In a typical multi-processor system, the design process includes two important steps: task allocation and task scheduling. The NoC design also needs to add two steps: mapping and path allocation. It can be found that the common and the opposite sex of the mapping algorithms in contrast to the literatures, in the NoC mapping optimization, the level considered in the establishment of the optimization model is relatively simple, and the basic is optimized in the level of communication. In this paper, task allocation, task scheduling and mapping are combined to optimize the mapping from a more integrated perspective, and the idea in CMP is introduced and improved. Mapping will be divided into three phases: task scheduling, IP core mapping and data unit mapping. The comparison of the simulation data shows that the mapping program is effective in reducing the power consumption of the system, so this mapping method provides an effective solution for low power NoC Mapping.
