Abstract: 3D Network on Chip (NoC) has emerged as a new platform to meet the performance requirements and scaling challenges of System on Chip. More investigations require addressing challenges in multiport topologies, minimizing foot printing of nodes and interconnections of wires. This paper discusses multi-port NoC topologies and routing in 2D hexagonal and 3D mesh NoC. Deadlock free routing for 2D hexagonal mesh topology is compared to ZXY routing in 3D mesh with similar number of nodes. Routing algorithms shows promising results, thus making the architecture and algorithm more suitable for the future NoC design.
INTRODUCTION
Addition of new ports in the router can overcome problem of limited bandwidth and scaling. In conventional 2D mesh NoC, an addition of one and two ports in intermediate tiles can be done as shown in Figure  1 and Figure 2 to make a 2D hexagonal mesh. A single diagonal link is required to increase from four to six port interconnections. Figure 3 shows conventional 3 x 3 x 3 mesh topology. 3D NoC, architecture is constructed using multiple uniform silicon planes and each active plane is a 2D mesh topology connected with vertical links or interconnection wires. An efficient routing is required to explore all the available paths to optimize the performance in terms of throughput, latency and energy. NoC design must be simple and short-path route is highly considered for low latency and power dissipation. Figure  2 represents an example of 7 port Hexagonal mesh router and 6 ports Diagonalized mesh router respectively. Recent development in Nano scale has open an option of an alternative to conventional on-chip communication network with a uniform stackable multi-chip modules (MCM) in three dimensions using through silicon via (TSV). The transition from 2D to 3D NoC is done by homogeneously distributing the tiles on to different layers of 3D NoC. [1] Explained design of homogeneous network on distinct layers using heterogeneous floor plans. The router assignment based design methodology is used for placement of processing element (PE) on first layers and their minimized connection to the routers is placed on the second layer. Application specific NoC design with optimized power consumption and minimum area dealt with ripup and reroute procedure for routing flows and a router merging procedure to optimize a given network topology [2] .
In [3] virtual channels are used, to achieve fault tolerance in k-ary n-cube topologies. Their method uses O (2 n ) virtual channels for a fully adaptive fault-tolerant routing algorithm. [4] Introduces planner adaptive routing algorithm to reduce the number of virtual channels in an n-D mesh. Their routing is partially adaptive and routing process is divided into a sequence of phases and then forwarding packets in two dimensions within each phase.
The Simulation Allocation (SAL) method [5] is used to determine the best suitable network topology for a given application 3D NoC along-with optimizing the power, network latency and the data traffic path respectively. The scalability issue of 2D and 3D NoC (mesh and bus topology) is addressed in [6] based on buffer-less hot-potato routing algorithm for mesh topology, while bus protocol incorporates centrally arbitrated least-served-first priority scheme. Emergence of 3D IC technology enabled the short vertical connections between the dies by the means of Through Silicon Via (TSV) [7] . The main challenge to 3D NoC is thermal dissipation as the power density and length of http://dx.doi.org/10.12785/ijcds/040104 http://journals.uob.edu.bh heat dissipation path increases with vertically stacking of dies, resulting high temperature and longer propagation delay. The leakage power also adversely affects system reliability. Vertical stacking and gluing of wafers for 3D node integration has been presented in [8] . This paper is organized as follows: Section 2 discusses the conventional 2D hexagonal and 3D mesh NoC architectures. Section 3 discuss about the crossbase routing (CB) algorithm. Section 4 presents simulations and comparative analysis of routing algorithms for 2D Hexagonal and 3D mesh NoC. Section 5 discusses detailed analysis of experiment results and finally conclusion in section 6. 
Wire Cost:
Considering an n system module interconnected in NoC, where the cost of each module does not change. Let, the unit length of wire is used between all the interconnected routers in NoC. Let, the number of interconnecting wires in each link is constant and equal to w. Then, the cost in terms of wire length of 2D mesh NoC, can be given as:
Assuming unit wiring cost w for all links in the network, i.e., w=1, cost for different network topologies are listed in Table 1. 5. Scaling: Number of nodes increase with increase in dimension. This is referred to as scaling. Hexagonal and Diagonalize mesh have similar structure and increase by factor of 2n. Whereas, k-ary n Dimension Fat tree increases with a factor of 2k.
Router Architecture for Hexagonal and Diagonalize mesh
Most important part of designing multiport NoC is the design of on-chip router. Complexity of router architecture increases with increase in number of ports. Symmetrical architecture of router is desired to ease fabrication process. Figure 4 shows typical router architecture for Hexagonal and Diagonalize mesh interconnection. Router consists of simple crossbar switch, input ports, output ports and programmable controller or arbiter. In multi-layer architecture, power consuming nodes are avoided to be stacked on top of each other to reduce the on chip thermal power. Usually power consuming nodes are put in the top layer and the low power consuming nodes are stacked on top of each other. 3D router design requires two additional physical ports to interconnect UP and DOWN links for inter-layer transfers in comparison to 2D design. Size of crossbar switch remains same for 2D hexagonal and 3D mesh. Number of input/output ports and flit length decides size and power consumption.
Hexagonal or Diagonalize mesh structure can be constructed if routers are designed and connected properly [8] . Figures 7 and 8 show hexagonal and diagonalize mesh structure using the 7-port and 6-port routers respectively. For the sake of simplicity, we assume cross links are routed across the IP CORE and two IP COREs can be fitted in the surrounding space by routers in Diagonalize mesh.
DEADLOCK FREE ROUTING ALGORITHM
Deadlock condition will occur if the flit is not able to find the next input channel to reach the destination or the queues in alternative output channels, supplied by the routing function, and are full. Due to this, the routing function R(d i , head(c i )) may not forward header flit to any adjacent output channel and data and tail flits get blocked as their header is in a full queue in the next channel. In a deadlocked situation, no header flit of any message can reach its destination.
Strategy for Deadlock Free Routing
For designing of deadlock-free routing algorithms for NoC, following assumptions are made [9] 2. As the message arrives at its destination it is eventually consumed.
3. A node can generate message of any arbitrary length but it must be longer than the length of single flit.
4. When an input buffer of a router port accepts the first flit using wormhole routing, it must accept all the remaining flits of the same message before accepting any flit from any another message.
5. Buffer size in all the input ports must be same.
6. An available buffer should liaise only among the requesting messages and avoid arbitration between messages in waiting.
7. Buffer must entertain flits belonging to the same message only. Moreover, it must be emptied before accepting any other flit from any adjacent node.
8. For an adaptive routing the path taken by the message would depend of availability status of output ports.
Figure 3 Router for (a) Hexagonal mesh (b) Diagonalized mesh
Adaptive routing function R in a connected network is deadlock free if:
1. An order between the channels is established to route a packet, such that, there is no cycle in its channel dependency graph .
2. There exists a subset of channels c ⊆ C such that, there is no cycle in its extended dependency graph of the connected network.
3. Virtual channels can be added to provide more paths between all source-destination pairs. The added virtual channel's sub graph is also acyclic to ensure deadlock free routing.
Figure 4 Interconnected Diagonalize mesh NoC using 6 ports
Suppose that there is a deadlock configuration for routing function R. Let c i be a nonempty queue such that there are no channels less than c i with a nonempty buffer. If c i is minimal, then the flit at the top of buffer can reach its destination in a single hop and there would be no deadlock. Otherwise, using channels less than c i , the flit at the buffer head of c i can advance and there is no deadlock.
http://journals.uob.edu.bh
Figure 5 Interconnected Hexagonal mesh NoC using 7 ports

Adaptive Routing
Definition: For a network graph , where, N is the set of nodes and C is the set of communication channels, an adaptive routing algorithm for each can be defined as a subset of routing algorithms where, applied to subset of graph satisfying the following conditions [10] :
1. In a given graph , consisting of input channel c and output channels ̃ such that ̃ , i.e., all the edges connecting the channels from source to destination. A packet from source s will start from output channel ̃ towards destination d using the path P as a sequence n 1 , (s 1 , a 1 ), n 2 , (s 2 , a 2 ), n 3 , (s 2 , a 3 ) ... n k-1 , (s k-1 , a k ) , where, n i are channel nodes and (s i , a i+1 ) are channels, for 1 ≤ i ≤ k -1.
If a path between source to destination exists via
input and output channel, such that, ̃ ̃ where, ̃ is the output channel edge, then a packet p will reach destination d using routing algorithm Rp, subject to the condition that there is no deadlock in the channel dependency graph.
Adaptive routing algorithm is capable of handling congestion without creating any deadlock condition. In congestion, a packet has to wait till availability of a link, through which it can be routed. One major design criterion of an adaptive algorithm is that it should be deadlock free. This is ensured either by restricting turns or ensuring that channel dependency graph has no cycles [12] . Adaptive routing may follow a non-minimal path.
Restricted turns in routing ensure deadlock free routing but, the number of shortest paths the algorithm allows from source to destination also known as (degree of adaptiveness) may vary.
Routing algorithm decide channel selection to route the packet from a source to destination. Routing strategy must be easier to implement and should comply with low latency and better throughput. Wormhole technique is the most suitable for NoC, but may cause deadlock or livelock conditions. Breaking the cycle in channel dependency graphs will make deadlock free routing. ZXY and Crossbase routing algorithms are deadlock free without any virtual channel requirement as their prohibited turns do not constitute any cycle in the dependency graph and extended channel dependency graphs [13] .  Use all VCs for all horizontal links.
In ZXY, routing is restricted by routing a packet in slice first, then moves along the rows and then move along the column toward destination. Figure 2 shows an example of ZXY routing.
Crossbase Routing Algorithm
Crossbase routing is developed to use the diagonal link available for the interconnecting paths between the nodes. Additional path provides shorter link between the nodes as packets can move directly to another node bypassing the node in same row or column thus reducing delay. If the links are available between the sources to destination path, preference is given to access the diagonal link provided it has not been reserved; otherwise normal XY routing strategy is adopted. Detailed algorithm is presented in Algorithm 1.
Experimental setup
Routing algorithms are initialized and simulation experiments are made over NoC, using NOXIM real time simulator [14] written in SystemC. Different traffic patterns like random and transpose traffic used for simulation over the 4 × 4 × 4 3D mesh NoC and 8 × 8 2D hexagonal mesh network for the random data. Flow control unit (flit) of size 10, having two header payload and eight bytes data payload, are generated by increasing packet injection rate (PIR) [15] from 0.01 to 0.98 with 0.001 steps, over the proposed topology and simulated for 10000 cycles. Delay for increase in the length of crosslinks is also considered while estimating global average delay.
Crossbase Routing for Hexagonal NoC
Require: S x , S y x, y, C x , C y x, y, D x , D y x, y:
Coordinates of source, current and destination node respectively node. dirX, dirY, dirx, y: horizontal, vertical and diagonal directions. 
Analysis of experimental results
Experiment for the same number of nodes, for 2D Hexagonal and 3D mesh NoC, is made for random and transpose traffic. As the congestion goes on increasing 3D NoC outperform 2D NoC. Average delay, maximum delay, throughput and total energy are the four performance parameters that are accounted in this work to evaluate the performance of CB and ZXY routing algorithms for random and transpose traffics on 2D Hexagonal and 3D mesh NoC. Figure 8(a) & 8(b) shows that, via ZXY routing less average and maximum delay can be achieved in random traffic as compared to CB routing at high congestion. While more realistic pattern are obtained in the transpose traffic, where the CB routing shows lower latency up to 5 percentage of Packet Injection Rate (PIR) as depicted by Figure 9 (a). This is due to simplicity and available cross-link in the 2D NoC.
As CB routing is adaptive in nature, the maximum throughput can be achieved at higher congestion represented by the higher PIR in Figure 9 
Conclusions
Our aim is to explore the 2D hexagonal mesh and 3D mesh to evaluate their performance using simple routing algorithm. For the small PIR the overall performance of 2D hexagonal mesh is better as compared to 3D, for the increased load condition the 3D mesh is suitable due to uniform architecture and equidistance of nodes. Power consumption is a critical issue in the NoC designing. Non stackable 2D architecture of Hexagonal mesh is better compared to 3D NoC, as it consumes significant low power, making it more suitable for NoC design. Thus we can state that CB routing gives more promising result http://journals.uob.edu.bh compared to ZXY routing technique. The routing and traffic sets configurations concluded in this paper will lead to achieve minimum latency with maximum throughput in NoC designing.
