Abstract-Three-dimensional (3D) integration offers greater device integration, reduced signal delay and reduced interconnect power. It also provides greater design flexibility by allowing heterogeneous integration. Stacked mesh 3D NoC architecture was proposed to take advantage of the intrinsic capability of reducing the wire length in 3D ICs. However, this architecture still exacerbates the on-chip power density and router cost. In this paper, we propose a novel hybridization scheme for inter-layer communication using efficient 5-input routers to enhance the overall system power, performance, and area characteristics of the existing Hybrid NoC-Bus 3D mesh architecture. By defining a rule for routing algorithms called LastZ, the proposed areaefficient architecture decreases the overall average hop count of a NoC-based system compared to the existing architectures. We further improve this design by proposing partial-LastZ-based 3D NoC-bus hybrid architecture to provide adaptivity for implementing congestion-aware and fault-tolerant inter-layer routing algorithms. Extensive quantitative experiments demonstrate up to 16% performance improvement compared to the full LastZ-based 3D NoC-bus hybrid architecture and around 20% area reduction compared to the typical hybrid NoC-Bus 3D mesh architecture.
I. INTRODUCTION
Network-on-Chip (NoC) is a general concept, proposed for complex on-chip communications because of scalability, better throughput and reduced power consumption [1] . However, increasing the number of cores over a 2D plane is not efficient enough due to long interconnects. The advent of 3D silicon integration technology has opened a new horizon for new on-chip interconnect design innovations. In 3D integration technologies, multiple layers of active devices are stacked above each other and vertically interconnected using ThroughSilicon Vias (TSVs) [2] . The comparison of 2D and 3D NoC architectures show that, 3D NoCs deliver better system performance with significantly lower energy per packet, as compared to the 2D implementations due to increased package density and shorter wires [3] .
The straightforward extension of popular planar 2D NoC structure is 3D Symmetric NoC created by simply adding two additional physical ports to each router; one for Up and one for Down [4] . Despite simplicity, this architecture has two major inherent drawbacks. Firstly, it does not exploit the beneficial feature of a negligible inter-wafer distance in 3D chips, because in this architecture, inter-layer and intra-layer hops are indistinguishable. Secondly, a considerably larger crossbar is required as a result of two extra ports [5] .
The stacked mesh 3D NoC (Hybrid NoC-Bus 3D mesh) architecture presented in [6] is a hybrid architecture between the packet switched network and the bus architecture to overcome the mentioned 3D Symmetric NoC challenges. It integrates the multiple layers of 2D mesh networks by connecting them with a bus spanning the entire vertical distance of the chip. As the inter-layer distance for 3D ICs is small, the bus length will also be smaller. This makes the bus suitable for interlayer communication in vertical direction. By using the stacked mesh architecture, six-port router is required instead of seven ports for typical 3D NoC router and vertical communication is just one hop away to any destination layer. However, the additional input port still imposes considerable extra logic to a NoC router, especially when complex routers with support of Virtual Channels (VCs) and load management schemes are required.
In this paper, an efficient hybridization scheme for interlayer communication is presented in order to enhance the overall system power, performance, and area characteristics of the existing Hybrid NoC-Bus mesh architecture. By defining a rule for routing algorithms called LastZ, the proposed areaefficient architecture decreases the overall average hop count of a NoC-based system compared to the existing architectures. Based on the proposed hybridization scheme, we present a low-power and high-performance 3D NoC architecture which enables congestion-aware and fault-tolerant inter-layer communication. As discussed in Section I, a 6×6 router has many disadvantages over its 5×5 counterpart. For instance, based on the reported results in [5] for 90nm CMOS technology, a 6-port router incurs around 36% area and 20% power overheads compared to a 5-port one. On the other hand, the packet delay to cross the router increases because more input channels compete to get access to the target output port. The larger crossbar also increases the router critical path and thereby reduces the router maximum operating frequency.
978
Our investigation shows that a straightforward hybridization of two different communication media (i.e., NoC and Bus) without considering their intrinsical characteristics is not an efficient strategy. Fig. 1 shows the conventional hybridization style of a Hybrid NoC-Bus mesh-based system. In this architecture, stacked routers in different layers are connected to a vertical bus. The routers are able to serve as either a master or a slave depending on the arbitration decision. Surprisingly, as will be shown later, just by following one basic rule in routing algorithm policy, it is possible to remove one input port and substitute 6×6 routers with 5×6 ones.
A. LastZ Rule
In 3D NoC, the packet routing process is classified into two different categories: intra-layer routing and inter-layer routing. For the 3D Hybrid NoC-Bus mesh architecture, the intra-layer packet routing is multi-hop because traditional NoC architecture is utilized for communication. In multi-hop communication, packet routing plays a crucial role because there are many minimal or non-minimal paths to send a packet from a source to a destination. In contrast, for inter-layer communication, the 3D Hybrid NoC-Bus mesh architecture benefits from bus-based one-hop communication. In this work, we define a rule which we call LastZ.
Definition 1 (LastZ). A 3D routing algorithm is LastZbased if the intra-layer routing process is completed before the inter-layer routing. In other words, in the LastZ-based routing algorithm, when a node N source sends a flit to a node N destination , the flit will first travel along the X or Y direction (statically or adaptively) in N source dimension until F lit xy =P illar xy , then it will traverse the last hop in the Z direction.
As will be shown later, this rule is astonishingly beneficial to improve system characteristics. It is noteworthy that this 
III. LASTZ-BASED HYBRIDIZATION ARCHITECTURE
Based on the defined LastZ rule, assume that a packet, after complete intra-layer routing, has reached the destination pillar (vertical bus). In this case, it is obvious that one of the connected routers to the vertical bus is the last router (hop) for delivering the packet to the target processing element (PE). Based on the fact that the destination is already known, it is not wise to send the packet again to the respective router for the routing decision making. Instead, it is more efficient to deliver the packet to the connected PE directly.
The explained scenario is the motivation to propose a new hybridization scheme for connecting components to a vertical bus. The proposed architecture is shown in Fig. 2 . As can be seen in the figure, it is practical to establish a more efficient inter-layer communication scheme without adding any extra workload and hardware to bus arbiters. Based on the proposed hybridization architecture, routers just serve as masters to initiate the transaction and PEs play the slave role and via the intermediate buffers directly receive their own packets. As can be seen in the figure, the intermediate input buffer which was used as an interface between a router and a bus, in this architecture connects a PE directly to the vertical bus. Bypassing routers enables a 3D NoC to utilize a 5×6 router instead of a larger 6-input port router.
IV. PARTIAL-LASTZ-BASED 3D NOC ARCHITECTURE
The LastZ-based hybridization offers many advantages such as low-cost and high-speed routers, fast intra-layer and interlayer packet transmission, reduced power consumption, and high-throughput network. However, this architecture suffers from inability to support adaptivity for inter-layer communication. More precisely, in a fully LastZ-based network, the vertical hop must be taken as the last hop. This rule leads to a limitation that the routing adaptivity is restricted to the intralayer routing. Therefore, if a totally adaptive routing algorithm is desired in order to balance the load across all layers or to bypass a faulty vertical link, the LastZ-based routing will not be efficient. We address this issue by presenting a partialLastZ-based 3D NoC architecture.
As shown in Fig. 3 , the partial-LastZ-based architecture is the combination of the typical Hybrid NoC-Bus 3D mesh and the LastZ-Based 3D NoC architectures. In this architecture, a number of vertical buses are designated to follow the typical vertical bus architecture, while others are still based on the proposed low-cost LastZ-based hybridization scheme. Consequently, inter-layer networks consist of two types of routers: 6×6 routers connected to typical buses and 5×6 routers connected to LastZ-based buses. The partial-LastZbased architecture has the advantage of adaptivity to handle congestions and faulty situations while enhancing the network characteristics in terms of performance, power consumption and area footprint.
In order to implement the routing mechanism for this architecture, the information of the typical bus nodes is stored in the network interface of each tile. The system uses the default LastZ-based routing algorithm in the normal situations. In the case of occurring congestion in particular layers or existence of faulty vertical buses, the information stored in the network interfaces can be used to find the closest alternative path to reach the destination layer.
V. EXPERIMENTAL RESULTS
To demonstrate the better performance, power, and area characteristics of the proposed 3D NoC, a cycle-accurate NoC simulation environment was implemented in HDL. The full LastZ-based 3D NoC-bus hybrid architecture [11] and partial LastZ-based 3D NoC-bus hybrid architecture were analyzed for a synthetic traffic pattern. 5×5×3 meshes and packets with a length of eight-flits were used for the simulations. The onchip network considered for experiment is formed by a typical state-of-the-art router structure including buffers, a routing Top view of 5×5×3 partial LastZ-based 3D NoC-bus hybrid architecture with 5 typical vertical buses unit, a switch allocator, VC allocators and a crossbar. For the full LastZ-based architecture, all the routers have 5 input and 6 output ports. For the partial LastZ-based architecture, we used the typical bus architecture with 6×6 routers for 5 vertical pillars while the other 20 pillars utilize the LastZbased hybridization scheme as shown in Fig. 4 . We chose the location of typical pillars in such a way that the maximum distance from each source node to the closet typical pillar is not more than 2 hops. In addition, since traffic congestion commonly occurs at the center of the mesh, the pillars being closer to the center are more suitable options to provide interlayer adaptivity.
To perform the simulations under synthetic traffic profiles, an unbalanced traffic generation scenario was used. In this scenario, 40%, 35%, and 25% of the total network traffic is generated by the nodes located in Layer0, Layer1, and Layer2, respectively. Each node follows the uniform traffic pattern to distribute packets throughout the network. In the uniform traffic pattern, a node sends a packet to other nodes with an equal probability. The packet latencies were averaged over 50,000 packets. Latencies were not collected for the first 5,000 cycles to allow the network to stabilize. It was assumed the buffer size of each FIFO was eight flits, and the data width was set to 64 bits.
For the full LastZ-based 3D NoC-bus hybrid architecture, (DyXY)Z [11] [12] wormhole routing algorithm was used, while we utilized (DyXY)Z(DyXY) routing algorithm for the partial LastZ-based 3D NoC-bus hybrid architecture. Because this routing algorithm is not deadlock-free, we used routers with two virtual channels per input port. The packet latency versus average packet arrival rate for different architectures under uniform traffic profile are shown in Fig. 5 . It can be observed for the mentioned scenario that the network with the partial LastZ-based architecture saturates at higher injection rates and always offers reduced average packet latency compared to the full LastZ-based 3D NoC-bus hybrid architecture. The reason being that, the partial LastZ-based architecture can balance the load distribution among all layers better than the full LastZ-based architecture due to the offered routing adaptivity. The area of the different routers was computed once synthesized on CMOS 65nm LPLVT STMicroelectronics standard cells using Synopsys Design Compiler. To observe the area savings of more complex routers, we synthesized routers supporting virtual channels as well. For these routers, we set the number of virtual channels to 2. The layout area of a conventional 2D NoC router, the proposed LastZ-based NoC router, a conventional 3D NoC-Bus Hybrid router, a 3D Symmetric NoC router and the wrapper are listed in Table  I . For all the routers, the data width and buffer depth were set to 32 bits and 8 slots, respectively. The figures given in the table reveal that compared to a conventional 3D NoCBus Hybrid router, the area savings for the proposed LastZbased router is around 18% and 21% for without-and with-VC implementations, respectively. For more complex routers supporting a large number of VCs, complex VC management techniques [13] [14] , and wider data width, it is expected to have more area savings.
VI. CONCLUSION AND FUTURE WORK
In this paper an efficient hybridization scheme was proposed to address the naive and straightforward hybridization between NoC and bus media in the 3D NoC-Bus Hybrid Mesh architecture. The hybridization mechanism benefiting from a rule called LastZ, enables low-cost inter-layer communication architecture. In order to provide routing adaptivity for the presented scheme, we proposed partial-LastZ-based 3D NoCbus hybrid architecture. This architecture utilizes a combination of typical and LastZ-based bus architectures for interlayer communication. Our extensive simulations showed that compared to the full LastZ-based 3D NoC-bus hybrid architecture, the partial-LastZ-based architecture achieves significant performance, and area improvements. In the future, our work will be extended by performing a comprehensive simulation to estimate the system power and measure NoC performance under realistic traces.
