A number of NoC architectures have been proposed such as the mesh architecture, the octagon architecture, the torus architecture, and the Butterfly Fat-Tree (BFT) architecture.
Each of the NoC architectures may have its own advantages for some specific problems or tasks, but all of them are designed to achieve high throughput, low latency, low Furthermore, the temperature information can also be utilized to achieve ''temperature-aware task scheduling".
II. RELATED WORK
Past SoC designs predominantly use shared-medium bus-based functional to integrate IP blocks [3] . There are mainly three types of commercial bus-based SoC interconnect specifications:
Wishbone [5] , and IBM CoreConnect [6] . Sensors are incorporated into the bus-based SoC to perform system monitoring and control. Chan et al. proposed to use the power management bus (PMB) to exchange the sensor information between IP cores, which yields intricate control and optimal management of the system [7] . Velasumy et al.
proposed a FPGA based SoC in which sensors are connected to the on-chip peripheral bus (OPB) to implement dynamic thermal management techniques [8] . The IBM Power6 architecture interconnects multiple sensors and actuators through a high-speed bus to perform voltage and thermal control [9] .
Although effective for small numbers of cores, bus-based interconnect approaches are generally not scalable for increasing core counts [10] . 
III. SENOC SIMULATION PLATFORM
A. An Overview of SENoC
Our SENoC simulation platform is based on the mesh architecture. Besides the conventional NoC architecture components such as the routers, the processing units (PUs), and the network interfaces (NIs), we add multiple sensors to each PU to obtain run-time information of the hardware. o sensor before the channel can be used by another message [3] . To 325 overcome the drawback, we introduce virtual channels (VC) [12] in the input and output ports to increase channel utility considerably. By introducing virtual channels, even though a flit belonging to one virtual channel is blocked, the flits of other virtual channels can still be transferred. As shown in Figure 2 , there are four FIFO buffers in each input and output port, respectively. Each buffer represents a virtual channel, thus we have four virtual channels in one physical channel.
Our switching method differs a little from the traditional VC method. We divide the time by 32 time slots which we call a frame. In the traditional VC method, the number of time slots distributed to each VC is fIxed. In a frame, even though there are no data to transfer in one VC, the time slots will still be reserved for this VC and these time slots are wasted. In our switching method, the router will check the output buffers before each frame. Only the non-empty VCs will be transferred in the transfer period. For example, if VC1 and VC2 are non-empty, VC3 and VC4 are empty at the moment that the router checks the buffers, then only VC1 and VC2 will be transferred in this frame, as shown in Figure 4 (b). The fIrst slot contains "frame information" used to inform the neighboring router which V Cs are transferred.
In this manner, the empty virtual channels will not occupy the time slots, thus increasing performance to some extent.
For simplicity, we use the static "X-Y" routing protocol [13] to avoid deadlock. Our SENoC also supports priority-based data transfer by adding a priority bit in the header flit. cycles for a 3GHz core to achieve a thermal resolution of less than 0.1 degC [17] . Our system's operating frequency is IGHz, so the sampling rate is around one every 3,000
cycles. To evaluate the overhead of SENoC, we actually try faster sampling rates in our experiments.
We choose transport latency as the indication of the overhead. Transport latency is defined as the time (in clock cycles) that elapses from between the occurrence of a message header injection into the network at the source node and the occurrence of a tail flit reception at the destination node [18] . In order to reach the destination node from the source node, flits must travel through a path 
B. Simulation results
Our purpose is to evaluate the overhead of sensors on the MPSoC's performance. We try different sampling rates of the sensor data, and record the average delays of regular data and sensor data, respectively. Besides, we also explore the influence of the SM's location on SENoC performance.
The SM is placed at three locations, as shown in Figure 6 (A, B, C). To be rigorous, all the cases are simulated under the same test bench. Our simulation results are shown in 
Gtl , . According to the results, it is best to place the SM in the center to achieve the least average delay of sensor data under the uniform traffic flow. Our future work will focus on the scalability of our SENoC simulation platform since the number of cores will increase to hundreds or even thousands. We will also study the best location of the SM under non-uniform traffic distribution.
