I. INTRODUCTION
Moore's law pave the way for integration of many Intellectual Property Blocks (IP blocks) in a single chip called as System-on-Chip (SoC). In SoC, many IP cores like DSP processor, Memory blocks etc are integrated. The semiconductor industries are working to integrate IP cores as many as possible inside a single chip. Multiprocessor SoC (MP SoC) techniques are emerging as a high speed computational system and the speed of computation leads to heavy power dissipation [1] . In MPSoC, the IP blocks can communicate with each other with the help of an on-chip network. High bandwidth communication is needed in a SoC when it is integrating with large number of IP cores. The communication between the IP blocks is still a bottleneck. The traditional methods such as bus communication system to interconnect IP cores also failed to provide an efficient communication system [2] . The communication delay in a multicore system called as latency plays an important role in the transfer of data from one core to the other. The fastest communication between the IP cores can be done by a technology called as Network-on-Chip (NoC). In NoC, instead of sending signals from one core to the other, packets are transferred to achieve high speed. The NoC system is comprised of Processing Elements (or) IP cores (or) Tiles, Network interface (NI), Routers for transferring the packets from one processing element (or) tile to other and vice versa. In the NoC, many IP cores are integrated into a single chip using network of routers. While designing the Network-on-Chip, the designer has to consider the important mechanisms such as network topology, routing algorithm, flow control mechanism (FCM), switching techniques. Routing protocols have a significant impact on the latency and power consumption of NoC-based systems. In order to avoid the blocking of packets within NoCs, some routing algorithms have to be adapted. An efficient and perfect routing algorithm which includes the fault tolerant capability and congestion tolerance capability can improve the overall performance of NoC.
In our work, a 2D mesh topology is used for NoC as shown in Fig. 1 . as it has many advantages over the other topologies. In this 2D mesh, each router has four neighbors (north, east, south, and west) except the corner routers .Corner routers have only two ports. Each router of this mesh connected topology is tied with an IP core through Network Interface (NI). The physical connection between the routers is established by bidirectional wires.
This work provides the in-depth studies on routing algorithms to discover and rectify the key problems in the current and next generation of many-core SoCs. All survey details reveal that there are more intrinsic overhead in conventional source routing and per hop running in distributed routing methods. Also the performance parameters like power consumption, latency and throughput are high, large and poor respectively. In order to eliminate all the drawbacks of NoC reported earlier, we developed a new technology which is CURVE based technology called as Docket-NoC (Dt-NoC) to improve the performance parameters.
Recent studies have explored that the Distributed routing algorithm in NoC is commonly used because of its flexibility. But the major drawback of source routing is the size of the header [3] . Similarly, the source routing increases the size of the source tables and this leads to increase in the size of the chip. This scheme shrinks all the disadvantages caused by the source routing. For a 4 x 4 node Processor, our proposed technique Dt-NoC needs only 2 bits for the entire process. Similarly this Dt-NoC works without the routing table for its routing to reduce the area.
Related work
The literature survey reveals that, in source routing the packet size is very large which leads to increase in latency. This technology is called as Baseline XY routing. In Baseline XY technique the size of the header depends on the size of the network. The header of the packet in source routing contains the entire routing path to reach the destination from the source. In order to reduce the header flit, encoded technique is used to reduce the bits in the header flit of the packet. This Technique is named as EnA (also called as EA). In this method only two bits are used to represent each halfway router.
The Network Interface which is available between the IP core and the router provides the two bits per hop. EnA uses the turn based model adapted in [4] stated that the incoming packets can choose any one of the four output port by 0°, 90°, 180°, and 270° turns. In this technique the local port is considered as 0° and the packets coming from one port will not go back to the same port. According to the data available in the header, the router decides the rotation to send the packets to the output port. Table 1 illustrates encoding table used in EnA method. The next method is the Optimized Encoded Address (OEnA) to encode the header flit of the incoming packets further than EnA method. This OEnA method uses the same strategy of EnA but instead of two bits, it uses only one bit per hop after the header turns to the other dimension. These entire source routing algorithms such as Baseline XY, EnA, OEnA is also called as OEA and it is used to add the bits to the header every time it reaches a router. This increases the number of bits in the header flit and thus increases the latency. Table 1 compares the code of both EnA and OEnA encoding techniques before turn and after turn. OEnA enhances the additional overhead of the EnA method by 25%.
As the OEnA uses added only one bit to the header flit, it reduction the power consumption of bits. The authors of [5] used the OEnA methodology and formed TNoC, but it is not fully adaptive, fault tolerant and fully deadlock free [6, 7] .
This work considers the source routing and distributed routing [8] techniques to provide fast routing in NoC. The objective of Dt-NoC scheme architecture is to reduce the latency by decreasing the number of bits in the header of the packets. As a clear routing method is formulated and implemented, Dt-NoC proves it is deadlock and live-lock free as it uses adaptive approach. As XY routing algorithm is simpler and fast [9, 10] , in this proposed work XY routing algorithm is taken as basic reference [11] . Dt-NoC is very much reliable to modify for any topology.
II. DESIGN OF DOCKET-NOC ROUTER
The Dt-NoC Router has the following components circular buffer, header comparison unit, docket generating unit and cross bar switch. The circular buffer is used to store the incoming packets.Circular buffer system is used to optimize the buffering process [13] . The circular buffer is useful for storing the incoming packets and transferring it to the destination, as illustrated in Fig. 2 . Dt-NoC Routing architecture has a central unit called as header comparison unit which compares the X/Y coordinate of the current router with the X/Y coordinate of the destination router as shown in Fig. 3 . The header comparison unit is available inside the Network Interface (NI) to compare and then produce the Docket bits.
When the header reaches the router, the destination address (X, Y coordinates) of the header flit is compared with the address (X,Y coordinates) of the current router using header comparison unit. If the output of comparison gate is '1', then it denotes the packets has to choose North-South Direction (Y axis), if the output is '0' then the packet has to choose East-West Direction (X axis). The second bit of the Docket shows the exact direction whether it has to move North or South or East or West. The second bit is denoted by 'φ' as illustrated in Table 2 . The value of 'φ' may be either '0' or '1', depends on the CURVE movement of the packet from its When a packet moves from the source router on the way to reach the destination, the first two bit dockets are generated. This docket bits shows the packet to get the next router in the next turn. Similarly, whenever the packet crosses the intermediate router, two docket bits are generated and replaces the previous docket bit to show the path for the packet to reach the destination.
Let us consider the IP core of a router (X i ,Y i ) sends the message from its local router to other router (destination router (X j , Y j )), it works according to the Dt-NoC algorithm as illustrated in Table 2 . If the coordinates are in the following condition, X i < X j, X i > X j, the packets moves to A 16-node, 4 x 4, 2D multicast mesh network, in which Dt-NoC routing method is applied for three scenarios as given in Fig. 6 . It illustrates the path for the packets with reference to the docket bit generated. The data packet has data flits [14, 15] , coordinates (X and Y) of the current router and Docket bits. Each Router is supplied with the information of destination router such as X, Y coordinates. Once the packet reaches the first current router, it compares the X coordinate of the current router with the X coordinates of the destination router. If X coordinates of the current and Source router is not same, then it checks which coordinates is larger. The Dt-NoC will work according to its algorithm as shown in the Table 3 . If the X coordinates of the current router is same as the destination router, then it will check the Y coordinates of the current and destination router. If it is same then it is concluded that the packets reached the destination. If it differs then it has to move according to the Dt-NoC algorithm.
For illustration of the Dt-NoC routing scheme, a 4 node SoC is considered and it is analyzed by considering the possible nodes as a Source node and the remaining node as the destination. The source node here is (0,0) and the destination node is (1,1).According to the Dt-NoC algorithm the packets takes the route shown as dotted lines in Fig. 7 . Initially the docket is denoted as 'XX' and it will change immediately as soon as the router decided its first movement. The Docket receives 'ZZ' as soon as it reached the destination.
X i ≠ X j, the router compares the X coordinates of current router with the X coordinates of destination router and the packet transverse as per the algorithm. If 
III. RESULTS AND PERFORMANCE ANALYSIS
A 16-node, 4 × 4 multicast mesh network with interconnection links is implemented in VHSIC HDL (VHDL) to obtain experimental results. The power consumption, latency and area of each architecture are obtained by a combination of cycle-accurate RTL router simulation, VHDL synthesis done in CADENCE in order to extract experimental results.
The overall performance of combining both the source and distributed routing methods is much better than the baseline XY schemes for both fixed and variable network size [16] [17] [18] . The latency (in nanoseconds) of various traffics are marked for various routing schemes shown in Fig. 8 . With high injection rate the overall performance of EnA, OEnA, and Dt-NoC routing algorithms are better than the baseline XY method [19] . From Fig. 8 ., it is clear that the values of both EnA/OEnA has almost same value. When compared with other techniques Dt-NoC scheme is providing a better performance in terms of latency for all types of traffics. Table 3 illustrates power consumption of 16 node mesh network for all the schemes discussed here. It is proved that that Dt-NoC routing algorithm with Dt-NoC architecture consumes less power compared to other schemes. Fig. 9 shows the comparison of power consumption with the corresponding clock period.
It is reported that Dt-NoC architecture consumes power approximately 33.75% 27.65% and 24.85 % less than Baseline, EnA and OEnA architectures respectively. Table 4 illustrates the clock period, frequency and area of baseline XY, EnA, OEnA and Dt-NoC methods which is calculated using CADENCE software. As both EnA and OEnA added the binary bits to the header, it has more clock cycle, frequency and area compared to DtNoC. From the simulation, it is proved that Dt-NoC is the high speed technique by 1.84%, 10.59%, and 6.06% less clock period compared to baseline XY, EnA, OEnA respectively.
As the Dt-NoC scheme eliminates the routing table, it occupies 14.29%, 8.22%, 7.57% less area than baseline XY, EnA and OEnA respectively. Table 5 illustrates the number of header bits needed for each approach and it is proved that the Dt-NoC needs only 10 bits for the entire process including the 8 bit destination address. That is the address of each router is 8 bit. Along with the 2 docket bits, the size of header is 10 bits. 
IV. CONCLUSION
From the analysis it is known that source routing algorithms result in large amount of information overhead to the packet's header. In this work, mathematical calculation of Dt-NoC is performed, analysed and implemented in terms of power consumption, latency, maximum frequency, and area overhead. Dt-NoC is the most efficient method which overcomes the drawbacks of both source routing and distributed routing algorithms. In the NoC communication infrastructure, wired 16 node, 2D-mesh based NoC architecture is designed, coded in VHDL language and simulated using CADENCE TSMC 18 nm technology and results were obtained. Dt-NoC has the best performance while compared to baseline and other proposed methods under running traffics for different injection rates while it only imposes two extra bits comparing to baseline. Also Dt-NoC is reported as the fastest technique comparing to baseline, EnA, and OEnA techniques by RTL based cycle accurate simulator. 
